SMC Wiki - User contributions [en]

GSoC/2015/Project ideas

2015-02-16T08:57:19Z

Nandaja: /* Create an Android IME */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of libindic framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==libindic Project Based==

===libindic Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing libindic infrastructure. We have decided following tasks as part of this project

# Provide REST API to libindic without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with libindic to provide Webfonts support.

==== Provide REST like API for libindic ====

libindic provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for libindic and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with libindic ====

libindic used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of libindic code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in libindic into javascript modules library===

'''Project''':

Port some of the libindic algorithms to node modules. Several modules, alogorithms in libindic project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into libindic===

Create a libindic module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a libindic module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a libindic module and it will be available on Android as part of the android SDK project which libindic has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa libindic] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:56:45Z

Nandaja: /* Create an Android IME */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of libindic framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==libindic Project Based==

===libindic Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing libindic infrastructure. We have decided following tasks as part of this project

# Provide REST API to libindic without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with libindic to provide Webfonts support.

==== Provide REST like API for libindic ====

libindic provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for libindic and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with libindic ====

libindic used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of libindic code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in libindic into javascript modules library===

'''Project''':

Port some of the libindic algorithms to node modules. Several modules, alogorithms in libindic project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into libindic===

Create a libindic module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a libindic module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a libindic module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa libindic] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:55:22Z

Nandaja: /* Converting indic processing modules currently in libindic into javascript modules library */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of libindic framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==libindic Project Based==

===libindic Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing libindic infrastructure. We have decided following tasks as part of this project

# Provide REST API to libindic without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with libindic to provide Webfonts support.

==== Provide REST like API for libindic ====

libindic provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for libindic and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with libindic ====

libindic used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of libindic code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in libindic into javascript modules library===

'''Project''':

Port some of the libindic algorithms to node modules. Several modules, alogorithms in libindic project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into libindic===

Create a libindic module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a libindic module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:54:51Z

Nandaja: /* SILPA Project Based */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of libindic framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==libindic Project Based==

===libindic Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing libindic infrastructure. We have decided following tasks as part of this project

# Provide REST API to libindic without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with libindic to provide Webfonts support.

==== Provide REST like API for libindic ====

libindic provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for libindic and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with libindic ====

libindic used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of libindic code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in libindic into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in libindic project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into libindic===

Create a libindic module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a libindic module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:52:01Z

Nandaja: /* A spell checker for Indic language that understands inflections */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of libindic framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:51:21Z

Nandaja: /* A spell checker for Indic language that understands inflections */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

libindic project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:05:34Z

Nandaja: /* ibus-braille module modifications */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [[GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project]] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T08:04:15Z

Nandaja: /* Projects with confirmed mentors */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list] of [http://smc.org.in Swathanthra Malayalam computing]

=Projects with confirmed mentors=

== ibus-braille module modifications ==

'''Project'''ː

This project will be to make improvements on the [GSoC/2014/Project_ideas#Adding_Braille_Keyboard_layouts_for_Indian_Languages_to_m17n_Library | project] that was successfully completed by a student under SMC. The remaining tasks areː
#Integrate Ibus-Braille with Liblouis
#Create Table editor for Liblouis
#Create a web version and host it.
#Add more indian languages to Liblouis
#Add facility to write direct braille Unicode characters
#Remove espeak dependency and make accessible via orca itself.

'''Complexity''':

'''Confirmed Mentor''': Samuel Thibault

'''How to contact the mentor''': IRC - youpi on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T05:01:23Z

Nandaja: /* Varnam Android Keyboard */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T05:00:48Z

Nandaja: /* Varnam Android Keyboard */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== Varnam based ==

== Varnam Android Keyboard ==

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-16T04:50:01Z

Nandaja:

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeesh''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== Varnam based ==

=== Varnam Android Keyboard ===

'''Project''':

Add support for varnamproject into android and have a native keyboard like indic keyboard. It can be integrated into Indic keyboard or have it as a separate keyboard.

'''Complexity''': Advanced

'''Confirmed Mentor''': Navaneeth K. N.

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''':

'''What the student will learn'''

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015

2015-02-13T09:17:37Z

Nandaja:

__NOTOC__

[[File:Banner-gsoc2015.png]]

<div style="border-radius: 50px; width: 92%; background-color: #4AA02C; float: left; display: block; margin: 1.5%; border: 1px solid #C4C295; text-align: center; padding: 2.5%; padding-top: 0px"> <h2>[[GSoC/2015/Project_ideas |'''Project ideas''']]
</h2>[[SoC/2015/application-template|'''Student application Template''']]
</div>
<div>

 
</div>

==Status Updates==
*

==FAQ==
* '''Is it a requirement to know Malayalam to participate in GSoC as part of SMC?'''

It is not a requirement to know Malayalam to participated in GSoc
as part of SMC. But it will be good if you are good in some Indian
language along with listed technologies.

* '''I have a project idea that is not listed in SMC project ideas. Can I propose new projects?'''

Of course. You are encouraged to propose any fresh project ideas with as much as details you can give. If the idea matches with the objectives of SMC, we will be happy to evaluate it for GSOC. SMC is umbrella project for Project Silpa - an Indian language computing project. So you are welcome to submit proposals for other languages too. Be aware that availability of mentors for Non-Malayalam languages is limited.

==Links==

*

GSoC/2015/application-template

2015-02-06T07:23:59Z

Nandaja: Created page with "==GSoC 2015 Application Template== ===Things to do=== * Join the `Student Projects mailing list http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.inn. Introduce..."

==GSoC 2015 Application Template==

===Things to do===

* Join the `Student Projects mailing list http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.inn. Introduce yourself over there so that the community can get to know you. Feel free to discuss your project ideas as well as to ask for any help that you may require.

* Prepare a detailed proposal regarding your project and submit it on the Google Summer of Code website (http://www.google-melange.com). You have to update the application in the Project wiki also(http://wiki.smc.org.in/)

* If you need help with anything, ask on the `Student projects list http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in or our IRC channel (`#smc-project on Freenode http://webchat.freenode.net/?randomnick=0&channels=smc-project) (don't be afraid if you don't know git for example. We'll teach you everything that is needed, the only required thing from you is enthusiasm and willingness to learn new things, but- '''don't expect spoon feeding from us''')

===Writing your proposal===

To be considered, a GSoC application must have a written proposal submitted to
http://www.google-melange.com/.

Start a wiki page to work on your proposal at
http://wiki.smc.org.in/. Every applicant should create an account in SMC wiki. If you add your proposal under your userpage, we will help you edit it and provide feedback (though understand that we will not help you write it)

Note that your final application must be submitted at http://www.google-melange.com/, so do not worry about the formatting of your application on the wiki, as you will have to reformat it there.You should not be too concerned with the formatting in Melange either, as we understand that the text editor there is not the best for making things look nice formatting-wise.We are more concerned with the content of your application, and that it is readable.

'''The application template is given below'''
-----

====Personal information====
* Email Address:
* Telephone: '''No Need to Provide Phone Number in Wiki since it is going to be public'''
* Blog URL:
* Freenode IRC Nick:
* Your university and current education:
* Why do you want to work with the Swathanthra Malayalam Computing?
* Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?
* Did you participate with the past GSoC programs, if so which years, which organizations?
* Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment
* Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?
* Why should we choose you over other applicants?

====Proposal Description====
Please describe your proposal in detail.

'''NOTE''': Please do not verbatim copy text from the ideas page, or from other people's
discussions about your project, but rewrite it in your own words.If you
include any significant text or code from another source in your
application, it must be accompanied with a proper citation. All papers or
references that you use or plan to use must also be cited. Put all this in
a "References" section at the bottom of your application.

'''Include''':
* An overview of your proposal
* The need you believe it fulfills
* Any relevant experience you have
* How you intend to implement your proposal
* A rough timeline for your progress with phases
* Any other details you feel we should consider
* Tell us about something you have created.
* Have you communicated with a potential mentor? If so, who?
* SMC Wiki link of your proposal

-----

====Other requirements====
''' Make sure you have completed following task to get qualified, failing to complete any task will results in rejecting your application.'''

* You have subscribed with the [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in Student project] mailinglist and respective project mailing list (eg. [http://lists.nongnu.org/mailman/listinfo/silpa-discuss SILPA mailinglist] for SILPA projects)
* Your application is available on [http://wiki.smc.org.in/ SMC Project wiki] under your userspace
* Your application is submitted to google-melange

In addition to the written proposal, we require every GSoC applicant to do this:
* Do create an account on the SMC wiki and start a wiki page for your proposal(Under your userpage). Keep it updated.
* We expect every GSoC participant to maintain a blog (If not, already) and post about their project's status, development, etc.
* Update the project status in the mailing list regularly with a meaningfull subjectline (don't use something like 'GSoC Project Update ')

'''Useful Links'''

* [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2015/help_page# GSoC FAQ]
* [http://www.google-melange.com/gsoc/events/google/gsoc2015 Timeline]
* [http://en.flossmanuals.net/GSoCstudentguide/ GSoC Student manual]
* [http://en.flossmanuals.net/melange/students-students-application-phase/ Melange manual - Student application Phase ]

GSoC/2015/Project ideas

2015-02-06T07:20:49Z

Nandaja: /* Indic rendering support in ConTeXt */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-06T07:20:12Z

Nandaja: /* Projects with unconfirmed mentors */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC -

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''':

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''':

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-06T07:18:34Z

Nandaja: /* A spell checker for Indic language that understands inflections */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''':

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - jishnu7 santhosh on #smc-project and #silpa on Freenode

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project and #silpa on Freenode

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2015/Project ideas

2015-02-06T07:17:49Z

Nandaja: /* Projects with unconfirmed mentors */

<center>
<font color="green"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)

=Ideas for Google Summer of Code 2015=
* Please Read the [http://wiki.smc.org.in/GSoC/2015#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

=Projects with unconfirmed mentors=
== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

The project is not about coding an existing algorithm, but to develop and implement an algorithm.

Hunspell's limitations can be understood from [[User:%E0%B4%B8%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8B%E0%B4%B7%E0%B5%8D/HunspellConversation| this conversation]] we had with the author of Hunspell in 2008

Homework to do before submitting applications:
# Use Hunspell in any Indian language like Malayalam for spell correction in editors or word processors and understand the limitations
# Study the nature of inflection and agglutination in Indian languages, read existing documents on this(ask for documents too) and note down your observations
# Study Hunspell and other spellcheckers to see how this problem is addressed
# Understand how a spell checker works. How to write a spellchecker from scratch?
# Come up with a plan about addressing the issue.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''':

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language and complete the homework as listed above.

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==SILPA Project Based==

===SILPA Project Improvements===

'''Project''':

This is set of ideas needed to improve the existing SILPA infrastructure. We have decided following tasks as part of this project

# Provide REST API to SILPA without disturbing existing JSONRPC API
# Improve the Transliteration module
# Integrate [https://github.com/Project-SILPA/flask-webfonts Flask Webfonts] extension with SILPA to provide Webfonts support.

==== Provide REST like API for SILPA ====

SILPA provides JSONRPC API currently which is also utilized by the templates of framework. JSONRPC is not well supported in all languages and results in [https://en.wikipedia.org/wiki/Not_invented_here NIH code]. So we would like to provide REST like HTTP based API's for SILPA and at the same time leave the current JSONRPC code untouched for backward compatibility reasons.

'''Objectives''':

* Develop module or use existing module to provide REST like API's
* API should support GET and POST. [http://www.w3.org/2001/tag/doc/whenToUseGet.html When to use GET?].

Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

==== Improve Transliteration module ====

We have a Transliteration module which supports transliteration from any Indic language to other Indic language and also support to English to Indic and Indic to English transliteration. Also we support IPA and ISO15919 transliteration system. But the module isn't in perfect shape and has lot of bugs. With this idea we would like to improve the following parts

# Improve cross indic language transliteration system. Currently only Malayalam and Kannada are working without any external language support, all other Indian languages are first transliterated to Malayalam and then transliterated to target Indic language. We want to remove this cycle from source -> Malayalam -> target.
# English to IPA transliteration is currently broken and this needs to be fixed. See [https://github.com/Project-SILPA/Transliteration/issues/3 IPA transliteration bug].
# Once the IPA transliteration issue above is fixed, imporve English to Indic transliteration system using IPA. Currently English to Indic transliteration system is done using CMU Sphinx dictionary which is having limited set of words which inturn limits the output of English to Indic transliteration system.
# Improve IS015919 to Indic transliteration system see [https://github.com/Project-SILPA/Transliteration/issues/4 IS015919 to Indic transliteration].

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

==== Integrating flask-webfonts extension with SILPA ====

SILPA used to have a Webfonts module for serving Indian language fonts as Webfonts for browsers. During GSOC 2013 it was separated as an extension to Flask framework which can be generally used with any Flask powered app. The current code can be found at [https://github.com/Project-SILPA/flask-webfonts]. The module is not fine tuned yet so below are the objectives.

# The module is not yet fine tuned and using it will make other modules break. This needs to be fixed (Can be checked with 'webfonts' branch of SILPA code on github.
# Write tests to check the functionalities.
# Adhere to Flask extension guidelines and submit the modules to Flask extensions directory.
# Write a tool which can take a directory containing fonts file or single font file and generate configuration file needed by the extension. (A possible such tool which is outdated can be found at [https://github.com/copyninja/fontinfo])
# Provide HTTP api's through flask extension which can expose the CSS for applications.

For all tasks above we expect documentation, test cases from the students as deliverable.

'''Complexity''' : Intermediate

'''Confirmed Mentors''' :

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Mailing List''': silpa-discuss@nongnu.org <preferred>

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

# Writing applications using Flask
# Various Transliteration system knolwedge
# Webfonts knowledge and writing extensions for Flask
# Test drive development.

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - jishnu7 santhosh on #smc-project and #silpa on Freenode

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': javascript, python

'''What the students will learn''':

===Integrate Varnam into Silpa===

Create a Silpa module which hosts [http://www.varnamproject.com varnam]. This includes making a python port for libvarnam and making a Silpa module which uses the python port.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project and #silpa on Freenode

''' Mailing List''': silpa-discuss@nongnu.org

'''Expertise required''': C, Python

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

* [http://www.varnamproject.com/docs/faq FAQ]
* [http://www.varnamproject.com/docs Documentation]
* [http://www.varnamproject.com/docs/contributing Contributors guide & ideas to work on]

Apart from the following ideas, you can propose your own idea.

===Programming language bindings & varnam-daemon===

Varnam is written on C which makes interoperability with other languages easy. There are language bindings available for `NodeJs` and `Ruby`. Supporting Varnam in multiple languages allows projects to use varnam easily to enable Indian language input.

To make using varnam from different languages easier, make a cross platform standalone process which uses `libvarnam` shared library and exposes a RPC API over network. This allows any programming language with a socket support can be used with libvarnam. This also makes language bindings fairly easy because they don't have to work with the native interoperability support. The protocol can be a simple text based protocol for all the commands that `libvarnam` supports.

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': C

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Create an Android IME===

Varnam will be ported as a Silpa module and it will be available on Android as part of the android SDK project which Silpa has proposed. This idea is merged to the [http://wiki.smc.org.in/SoC/2014/Project_ideas#Android_SDK_for_Silpa Silpa] project ideas.

===Enable varnam's suggestions system to be used from Inscript or any other input system===

Varnam has knowledge about lot of words. This idea proposes a method to use these words and provide suggestions for other input systems. Basically, in Varnam, the API call will be something like,

<code><pre>
varnam_get_suggestions (handle, "भारत");
</pre></code>

This will fetch all the suggestions which has the given prefix.

`varnam_get_suggestions` needs to keep track of the previous words and use [http://en.wikipedia.org/wiki/N-gram n-gram] based dataset to filter the results. This should also learn the words back into the word corpus that varnam is using. Filtering suggestions won't be just a prefix search, but it will have knowledge about how text can be written in the target language and provide smart filtering. Searching in a large corpus and providing real-time suggestions makes this a challenging task.

Once this is implemented in `libvarnam`, it can be used in the ibus-engine.

'''Complexity''' : Advanced

'''Expertise required''': C, Unicode & encodings

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' :

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2014

2014-03-10T16:22:49Z

Nandaja: /* FAQ */

<div class='grid'>
<span class='row highlight'>SMC is participating in Google Summer of Code 2014</span>
<div class='toc row'>
<div class='three columns'>
[[SoC/2014/Project_ideas|'''Project ideas''']]
</div>
<div class='three columns'>
[[SoC/2014/application-template|'''Student application Template''']]
</div>
</div>
[[file:Gsoc2014.png|right|thumb]]
==Status Updates==

'''February 25 , 2014'''
<div class='row'>
<div class='six columns'>
Really Happy to announce that We are selected for Google Summer of Code 2014. Google Summer of Code (GSoC) is a program that offers student developers stipends to write code for various open source projects. and this is the third time we are being selected as a mentoring organization.

If you are a student and would be interested in participating in GSoC with Swathanthra Malayalam Computing as your mentoring organization, please take a look at our [http://wiki.smc.org.in/SoC/2014/Project_ideas GSoC Ideas] page:

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]
</div>
</div>

'''February 4 , 2014'''
<div class='row'>
<div class='six columns'>
* Mentoring Organization Applications Now Being Accepted for Google Summer of Code 2014! - [http://google-opensource.blogspot.in/2014/02/mentoring-organization-applications-now.html Read More]
</div>
</div>

==FAQ==
* '''Is it a requirement to know Malayalam to participate in GSoC as part of SMC?'''

It is not a requirement to know Malayalam to participated in GSoc
as part of SMC. But it will be good if you are good in some Indian
language along with listed technologies.

* '''I have a project idea that is not listed in SMC project ideas. Can I propose new projects?'''

Of course. You are encouraged to propose any fresh project ideas with as much as details you can give. If the idea matches with the objectives of SMC, we will be happy to evaluate it for GSOC. SMC is umbrella project for Project Silpa - an Indian language computing project. So you are welcome to submit proposals for other languages too. Be aware that availability of mentors for Non-Malayalam languages is limited.

* '''I'm interested in SILPA project ideas what I should do?'''
Glad to know, first thing is to clone code read the code try to deploy it and read the documentation [2] if you face problem try to ask us on IRC or here on this list or SILPA mailing list [3]

* '''I've set up the SILPA locally and read the code, now what?'''
Good now try to read tasks we have mentioned under ideas list, try to see which module we are talking about and understand the code. Show us Proof of concepts and what you worked on propose your own improvements to code.

* '''I saw several ideas for SILPA project but now I see it has been clubbed into single one. So is it single idea now or multiple?'''
Its now clubbed as single idea because we felt individual ideas are pretty simple to implement so we made them into single. So now collective ideas are together called *SILPA improvements*. There is Varnam and other Javscript Android SDK ideas they are still independent ideas.

[1] http://wiki.smc.org.in/SoC/2014/Project_ideas#SILPA_Project_Based

[2] http://silpa.readthedocs.org

[3] silpa-discuss at nongnu.org

==Links==
* [https://www.google-melange.com/gsoc/homepage/google/gsoc2014 Google Summer of Code 2014 official website]
* [https://developers.google.com/open-source/soc/ GSoC page in Google Developers Website]
* [http://google-opensource.blogspot.in/search/label/gsoc GSoC News in Google Opensource Blog]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]
* [https://www.youtube.com/watch?v=xQyyr_a9rQ4 Student application process]
* [http://www.google-melange.com/gsoc/events/google/gsoc2014 GSOC 2014 timeline]

== In news ==
* http://beta.mangalam.com/tech/tech-news/153590
* http://news.keralakaumudi.com/news.php?nid=5e08ab8bd979a8b12cf8acbbec7835b8
* http://www.mathrubhumi.com/technology/others/google-summer-of-code-2014-smc-swathanthra-malayalam-computing-malayalam-computing-433388/
* http://www.mediaonetv.in/news/22571/fri-02282014-1932
* http://digitalpaper.mathrubhumi.com/c/2477711

GSoC/2014

2014-03-10T16:22:04Z

Nandaja: /* FAQ */

<div class='grid'>
<span class='row highlight'>SMC is participating in Google Summer of Code 2014</span>
<div class='toc row'>
<div class='three columns'>
[[SoC/2014/Project_ideas|'''Project ideas''']]
</div>
<div class='three columns'>
[[SoC/2014/application-template|'''Student application Template''']]
</div>
</div>
[[file:Gsoc2014.png|right|thumb]]
==Status Updates==

'''February 25 , 2014'''
<div class='row'>
<div class='six columns'>
Really Happy to announce that We are selected for Google Summer of Code 2014. Google Summer of Code (GSoC) is a program that offers student developers stipends to write code for various open source projects. and this is the third time we are being selected as a mentoring organization.

If you are a student and would be interested in participating in GSoC with Swathanthra Malayalam Computing as your mentoring organization, please take a look at our [http://wiki.smc.org.in/SoC/2014/Project_ideas GSoC Ideas] page:

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]
</div>
</div>

'''February 4 , 2014'''
<div class='row'>
<div class='six columns'>
* Mentoring Organization Applications Now Being Accepted for Google Summer of Code 2014! - [http://google-opensource.blogspot.in/2014/02/mentoring-organization-applications-now.html Read More]
</div>
</div>

==FAQ==
* '''Is it a requirement to know Malayalam to participate in GSoC as part of SMC?'''

It is not a requirement to know Malayalam to participated in GSoc
as part of SMC. But it will be good if you are good in some Indian
language along with listed technologies.

* '''I have a project idea that is not listed in SMC project ideas. Can I propose new projects?'''

Of course. You are encouraged to propose any fresh project ideas with as much as details you can give. If the idea matches with the objectives of SMC, we will be happy to evaluate it for GSOC. SMC is umbrella project for Project Silpa - an Indian language computing project. So you are welcome to submit proposals for other languages too. Be aware that availability of mentors for Non-Malayalam languages is limited.

* '''I'm interested in SILPA project ideas what I should do?'''
Glad to know, first thing is to clone code read the code try to deploy it and read the documentation [2] if you face problem try to ask us on IRC or here on this list or SILPA mailing list [3]

* '''I've set up the SILPA locally and read the code, now what?'''
Good now try to read tasks we have mentioned under ideas list, try to see which module we are talking about and understand the code. Show us Proof of concepts and what you worked on propose your own improvements to code.

* '''I saw several ideas for SILPA project but now I see it has been clubbed into single one. So is it single idea now or multiple?'''
Its now clubbed as single idea because we felt individual ideas are pretty simple to implement so we made them into single. So now collective ideas are together called *SILPA improvements*. There is Varnam and other Javscript Android SDK ideas they are still independent ideas.

[1] http://wiki.smc.org.in/SoC/2014/Project_ideas#SILPA_Project_Based
[2] http://silpa.readthedocs.org
[3] silpa-discuss at nongnu.org

==Links==
* [https://www.google-melange.com/gsoc/homepage/google/gsoc2014 Google Summer of Code 2014 official website]
* [https://developers.google.com/open-source/soc/ GSoC page in Google Developers Website]
* [http://google-opensource.blogspot.in/search/label/gsoc GSoC News in Google Opensource Blog]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]
* [https://www.youtube.com/watch?v=xQyyr_a9rQ4 Student application process]
* [http://www.google-melange.com/gsoc/events/google/gsoc2014 GSOC 2014 timeline]

== In news ==
* http://beta.mangalam.com/tech/tech-news/153590
* http://news.keralakaumudi.com/news.php?nid=5e08ab8bd979a8b12cf8acbbec7835b8
* http://www.mathrubhumi.com/technology/others/google-summer-of-code-2014-smc-swathanthra-malayalam-computing-malayalam-computing-433388/
* http://www.mediaonetv.in/news/22571/fri-02282014-1932
* http://digitalpaper.mathrubhumi.com/c/2477711

GSoC/2014/Project ideas

2014-02-28T12:57:36Z

Nandaja: /* Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Objectives''':
*Use Flask-Restuful and write separate module without modifying existing JSONRPC. JSONRPC need to be present to allow backward compatibility
*Both GET and POST should be supported. Deverloper can decide on what to use. (Do we need this?)
Many people have doubt on how the API should look like. We can give twitter API (https://dev.twitter.com/docs/api) as example
Sample API calls :
-------------------------------------------------------------
POST api.silpa.org.in/payyans/ASCII2Unicode
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
POST api.silpa.org.in/payyans/Unicode2ASCII
Paramets: text, font
Response: JSON data
-------------------------------------------------------------
Generic:
GET/POST (http://api.silpa.org.in/module/function_name or http://silpa.org.in/api/module/function_name)
Parameters: function parameters
Response: JSON encoded return value from function

'''Complexity''' : Easy

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Objectives''':
<Please note this idea is for a SDK, not an app or just a java port>

*All modules need to be ported to java so that it can be used inside an Android Project.
*Other applications should be able to use this Silpa library to easy integrate features (as a SDK) from our modules. Eg.
**Transliteration - Developer can specify a text input inside the application needs transliteration, and our SDK should take care of the transliteration process whenever user inputs text to that field.
**Render module - Detect whether necessary font is available in the system, if it is not, render text as image and replace text with this.
**All modules can be explained like this.
*Investigate whether image rendering part of render module can be done in device, inside application itself. Few ways to implement that are
**Compiling cairo/pango with ndk
**Compiling Harffbuzz from AOSP tree with ndk
**Based on the result of rendering module investigation, we can device on whether to use server side rendering or not.
**Pack popular fonts with the SDK, Use it to display text if device doesn't have required font. (there are few hacks to get better rendering in older versions of android). Developer should be able to force rendering using packaged font, to get consistency across devices.

<Better to prepare SDK with helper than preparing application itself. SDK aka library>

'''Complexity''' : Advanced

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration. Current Indic to English and vice versa transliteration depends on CMUSphinx dictionary which is having limited set of words which will result in some words being left in native text.

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' : Easy

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Create a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**(Task from last GSoC)
* flask-webfonts needs further improvements and fine tuning
* This should be integrated into our SILPA and submitted to Flask

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' : Medium

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:50:15Z

Nandaja: /* Improving cross language transliteration system. */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Objectives''':
<Please note this idea is for a SDK, not an app or just a java port>

*All modules need to be ported to java so that it can be used inside an Android Project.
*Other applications should be able to use this Silpa library to easy integrate features (as a SDK) from our modules. Eg.
**Transliteration - Developer can specify a text input inside the application needs transliteration, and our SDK should take care of the transliteration process whenever user inputs text to that field.
**Render module - Detect whether necessary font is available in the system, if it is not, render text as image and replace text with this.
**All modules can be explained like this.
*Investigate whether image rendering part of render module can be done in device, inside application itself. Few ways to implement that are
**Compiling cairo/pango with ndk
**Compiling Harffbuzz from AOSP tree with ndk
**Based on the result of rendering module investigation, we can device on whether to use server side rendering or not.
**Pack popular fonts with the SDK, Use it to display text if device doesn't have required font. (there are few hacks to get better rendering in older versions of android). Developer should be able to force rendering using packaged font, to get consistency across devices.

<Better to prepare SDK with helper than preparing application itself. SDK aka library>

'''Complexity''' : Advanced

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration. Current Indic to English and vice versa transliteration depends on CMUSphinx dictionary which is having limited set of words which will result in some words being left in native text.

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' : Easy

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Create a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**(Task from last GSoC)
* flask-webfonts needs further improvements and fine tuning
* This should be integrated into our SILPA and submitted to Flask

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' : Medium

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:47:04Z

Nandaja: /* Android SDK for Silpa */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Objectives''':
<Please note this idea is for a SDK, not an app or just a java port>

*All modules need to be ported to java so that it can be used inside an Android Project.
*Other applications should be able to use this Silpa library to easy integrate features (as a SDK) from our modules. Eg.
**Transliteration - Developer can specify a text input inside the application needs transliteration, and our SDK should take care of the transliteration process whenever user inputs text to that field.
**Render module - Detect whether necessary font is available in the system, if it is not, render text as image and replace text with this.
**All modules can be explained like this.
*Investigate whether image rendering part of render module can be done in device, inside application itself. Few ways to implement that are
**Compiling cairo/pango with ndk
**Compiling Harffbuzz from AOSP tree with ndk
**Based on the result of rendering module investigation, we can device on whether to use server side rendering or not.
**Pack popular fonts with the SDK, Use it to display text if device doesn't have required font. (there are few hacks to get better rendering in older versions of android). Developer should be able to force rendering using packaged font, to get consistency across devices.

<Better to prepare SDK with helper than preparing application itself. SDK aka library>

'''Complexity''' : Advanced

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Create a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**(Task from last GSoC)
* flask-webfonts needs further improvements and fine tuning
* This should be integrated into our SILPA and submitted to Flask

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' : Medium

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:45:59Z

Nandaja: /* Android SDK for Silpa */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Objectives''':
<Please note this idea is for a SDK, not an app or just a java port>

#All modules need to be ported to java so that it can be used inside an Android Project.
#Other applications should be able to use this Silpa library to easy integrate features (as a SDK) from our modules. Eg.
*Transliteration - Developer can specify a text input inside the application needs transliteration, and our SDK should take care of the transliteration process whenever user inputs text to that field.
*Render module - Detect whether necessary font is available in the system, if it is not, render text as image and replace text with this.
*All modules can be explained like this.
#Investigate whether image rendering part of render module can be done in device, inside application itself. Few ways to implement that are
*Compiling cairo/pango with ndk
*Compiling Harffbuzz from AOSP tree with ndk
*Based on the result of rendering module investigation, we can device on whether to use server side rendering or not.
*Pack popular fonts with the SDK, Use it to display text if device doesn't have required font. (there are few hacks to get better rendering in older versions of android). Developer should be able to force rendering using packaged font, to get consistency across devices.

<Better to prepare SDK with helper than preparing application itself. SDK aka library>

'''Complexity''' : Advanced

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Create a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**(Task from last GSoC)
* flask-webfonts needs further improvements and fine tuning
* This should be integrated into our SILPA and submitted to Flask

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' : Medium

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:41:41Z

Nandaja: /* Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Complexity''' :

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Create a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**(Task from last GSoC)
* flask-webfonts needs further improvements and fine tuning
* This should be integrated into our SILPA and submitted to Flask

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' : Medium

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:37:53Z

Nandaja: /* Adding MARC21 import/export feature in Grandham */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Complexity''' :

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:37:40Z

Nandaja: /* Word corpus synchronization */

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Complexity''' :

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-28T12:35:55Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]
* If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Complexity''': Advanced

'''Confirmed Mentor''': Santhosh Thottingal

'''How to contact the mentor''': IRC - santhosh on #smc-project on Freenode

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''What the student will learn''':

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Rajeesh K Nambiar

'''How to contact the mentor''': IRC - rajeeshknambiar on #smc-project on Freenode

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''What the students will learn''':

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Complexity''' :

'''Confirmed Mentor''' : Deepa P Gopinath

'''How to contact the mentor''': IRC - deepagopinath on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentors''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''What the students will learn''':

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Complexity''' :

'''Confirmed Mentors''' : Hrishikesh K. B, Jishnu Mohan, Aashik S

'''How to contact the mentor''': IRC -
*Hrishikesh K B - stultus on on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode
*Aashik S - irumbumoideen on #smc-project on Freenode

'''Expertise required''': Java, Android, Python

'''What the students will learn''':

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Complexity''' :

'''Confirmed Mentor''' : Jishnu Mohan

'''How to contact the mentor''': IRC - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': javascript, python

'''What the students will learn''':

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC -
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': Python

'''What the students will learn''':

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Complexity''' :

'''Confirmed Mentors''' : Vasudev Kamath, Jishnu Mohan

'''How to contact the mentor''': IRC
*Vasudev Kamath - copyninja on #smc-project and #silpa on Freenode
*Jishnu Mohan - jishnu7 on #smc-project and #silpa on Freenode

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''What the students will learn''':

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Complexity''' :

'''Confirmed Mentors''' : Pirate Praveen, Ershad K

'''How to contact the mentors''': IRC
*Pirate Praveen - j4v4m4n on #smc-project on Freenode
*Ershad K - ershad on #smc-project on Freenode

'''Expertise required''': Ruby on Rails

'''What the students will learn''':

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Complexity''' : Advanced

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Basic understanding of golang and C

'''What the students will learn''':

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

Varnam has a learning system built-in which can learn words and it can also learn possible other ways to write a word. Consider the following example.

<code>
<pre>
learn("भारत") = [bharat, bhaarath, bharath]
transliterate("bharat") = भारत
transliterate("bhaarath") = भारत
transliterate("bharath") = भारत
</pre>
</code>

Varnam also learns a word's prefixes so that it can produce better predictions for any word which has the same prefix. So in this case, with just learning the word "भारत", varnam can predict "bharateey" = "भारतीय".

The proposed idea talks about making this learn better. One example is infer the word "भारत" when learning भारतीय. Something like a porter stemmer implementation but integrated into the varnam framework so that
new language support can be added easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C, Ruby (basics)

'''What the students will learn''':

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Complexity''' : Medium

'''Confirmed Mentor''' : Navaneeth K N

'''How to contact the mentor''': IRC - nkn__ on #smc-project on Freenode

'''Expertise required''': Knowledge in C/golang

'''What the students will learn''':

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Complexity''' :

'''Confirmed Mentor''' : Anivar Aravind

'''How to contact the mentor''': IRC - anivar on #smc-project on Freenode

'''Expertise required''':

'''What the students will learn''':

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Complexity''' : High

'''Confirmed Mentor''' : Ershad K

'''How to contact the mentor''': IRC - ershad on #smc-project on Freenode

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''What the students will learn''':

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

=Projects with unconfirmed mentors=

GSoC/2014/Project ideas

2014-02-25T19:42:06Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]>

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''Complexity''': Advanced

'''Mentor''': Santhosh Thottingal

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''Complexity''' : Advanced

'''Mentor''' : Rajeesh K Nambiar

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

'''Background Reading'''
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Mentor''': Deepa P. Gopinath

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''Mentor''' : Vasudev/Jishnu

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Expertise required''': Java, Android, Python

'''Mentor''' : Jishnu/Hrishikesh/Aashik

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Expertise required''': javascript, python

'''Mentor''' : Jishnu

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Expertise required''':python

'''Mentor''' : Vasudev/Jishnu

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''Mentor''' : Jishnu/Vasudev

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Expertise required''': Ruby on Rails

'''Mentor''': Pirate Praveen, Ershad K

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Expertise required''': Basic understanding of golang and C

'''Complexity''': Advanced

'''Mentor''': Navaneeth S

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

'''Expertise required''': Knowledge in C

'''Complexity''': Medium

'''Mentor''': Navaneeth S

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Expertise required''': Knowledge in C/golang

'''Complexity''' : Medium

'''Mentor''': Navaneeth S

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

'''More Details'''
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/

'''Mentor''': Anivar Aravind

=Projects with unconfirmed mentors=

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''Complexity''' : High

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2014/Project ideas

2014-02-25T19:39:21Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]>

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''

'''Expertise required''': Average level understanding of grammar system of at least one Indian language

'''Complexity''': Advanced

'''Mentor''': Santhosh Thottingal

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''

'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.

'''Complexity''' : Advanced

'''Mentor''' : Rajeesh K Nambiar

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

* '''Background Reading'''
** [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
** [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
** http://www.speech.cs.cmu.edu/
** http://cmusphinx.sourceforge.net/wiki/tutorial
** [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
** [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
** [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
** [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Mentor''': Deepa P. Gopinath

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

'''Expertise required''': Python , Flask , Jinja , HTML, Javascript

'''Mentor''' : Vasudev/Jishnu

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Expertise required''': Java, Android, Python

'''Mentor''' : Jishnu/Hrishikesh/Aashik

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Expertise required''': javascript, python

'''Mentor''' : Jishnu

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Expertise required''':python

'''Mentor''' : Vasudev/Jishnu

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
** [https://github.com/wikimedia/jquery.i18n jquery.i18n]
** [https://github.com/wikimedia/jquery.ime jquery.ime]
** [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]

'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts

'''Mentor''' : Jishnu/Vasudev

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Expertise required''': Ruby on Rails

'''Mentor''': Pirate Praveen, Ershad K

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Expertise required''': Basic understanding of golang and C

'''Complexity''': Advanced

'''Mentor''': Navaneeth S

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

'''Expertise required''': Knowledge in C

'''Complexity''': Medium

'''Mentor''': Navaneeth S

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Expertise required''': Knowledge in C/golang

'''Complexity''' : Medium

'''Mentor''': Navaneeth S

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

* '''More Details'''
** http://www.acharya.gen.in:8080/disabilities/bh_brl.php
** http://en.wikipedia.org/wiki/Bharati_Braille
** http://www.nongnu.org/m17n/

'''Mentor''': Anivar Aravind

=Projects with unconfirmed mentors=

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''Complexity''' : High

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2014/Project ideas

2014-02-25T19:35:42Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]>

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''
'''Expertise required''': Average level understanding of grammar system of at least one Indian language
'''Complexity''': Advanced
'''Mentor''': Santhosh Thottingal

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''
'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.
'''Complexity''' : Advanced
'''Mentor''' : Rajeesh K Nambiar

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

* '''Background Reading'''
** [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
** [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
** http://www.speech.cs.cmu.edu/
** http://cmusphinx.sourceforge.net/wiki/tutorial
** [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
** [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
** [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
** [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Mentor''': Deepa P. Gopinath

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.
'''Expertise required''': Python , Flask , Jinja , HTML, Javascript
'''Mentor''' : Vasudev/Jishnu

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Expertise required''': Java, Android, Python
'''Mentor''' : Jishnu/Hrishikesh/Aashik

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Expertise required''': javascript, python
'''Mentor''' : Jishnu

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Expertise required''':python
'''Mentor''' : Vasudev/Jishnu

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
** [https://github.com/wikimedia/jquery.i18n jquery.i18n]
** [https://github.com/wikimedia/jquery.ime jquery.ime]
** [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]
'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts
'''Mentor''' : Jishnu/Vasudev

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Expertise required''': Ruby on Rails
'''Mentor''': Pirate Praveen, Ershad K

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Expertise required''': Basic understanding of golang and C
'''Complexity''': Advanced
'''Mentor''': Navaneeth S

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

'''Expertise required''': Knowledge in C
'''Complexity''': Medium
'''Mentor''': Navaneeth S

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Expertise required''': Knowledge in C/golang
'''Complexity''' : Medium
'''Mentor''': Navaneeth S

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

* '''More Details'''
** http://www.acharya.gen.in:8080/disabilities/bh_brl.php
** http://en.wikipedia.org/wiki/Bharati_Braille
** http://www.nongnu.org/m17n/

'''Mentor''': Anivar Aravind

=Projects with unconfirmed mentors=

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Expertise required''': Knowledge in Ruby/Ruby on Rails
'''Complexity''' : High

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2014/Project ideas

2014-02-25T19:32:51Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]>

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''
'''Expertise required''': Average level understanding of grammar system of at least one Indian language
'''Complexity''': Advanced
'''Mentor''': Santhosh Thottingal

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

'''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>

* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''
'''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.
'''Complexity''' : Advanced
'''Mentor''' : Rajeesh K Nambiar

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

* '''Background Reading'''
** [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
** [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
** http://www.speech.cs.cmu.edu/
** http://cmusphinx.sourceforge.net/wiki/tutorial
** [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
** [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
** [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
** [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao

'''Mentor''': Deepa P. Gopinath

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.
'''Expertise required''': Python , Flask , Jinja , HTML, Javascript
'''Mentor''' : Vasudev/Jishnu

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

'''Expertise required''': Java, Android, Python
'''Mentor''' : Jishnu/Hrishikesh/Aashik

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

'''Expertise required''': javascript, python
'''Mentor''' : Jishnu

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

'''Expertise required''':python
'''Mentor''' : Vasudev/Jishnu

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

'''More Details'''
** [https://github.com/wikimedia/jquery.i18n jquery.i18n]
** [https://github.com/wikimedia/jquery.ime jquery.ime]
** [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]
'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts
'''Mentor''' : Jishnu/Vasudev

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

'''Expertise required''': Ruby on Rails
'''Mentor''': Pirate Praveen, Ershad K

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

'''Expertise required''': Basic understanding of golang and C
'''Complexity''': Advanced
'''Mentor''': Navaneeth S

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

'''Expertise required''': Knowledge in C
'''Complexity''': Medium
'''Mentor''': Navaneeth S

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

'''Expertise required''': Knowledge in C/golang
'''Complexity''' : Medium
'''Mentor''': Navaneeth S

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

* '''More Details'''
** http://www.acharya.gen.in:8080/disabilities/bh_brl.php
** http://en.wikipedia.org/wiki/Bharati_Braille
** http://www.nongnu.org/m17n/

'''Mentor''': Anivar Aravind

=Projects with unconfirmed mentors=

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Expertise required''': Knowledge in Ruby/Ruby on Rails
'''Complexity''' : High

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

GSoC/2014/Project ideas

2014-02-25T19:28:10Z

Nandaja:

<center>
<font color="red"> <big>'''Apart from the following ideas , you can propose your own ideas'''</big></font>
</center>

=Potential Mentors=
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil ('''jainbasil''' on irc.freenode.net)
# Ershad K ('''ershad''' on irc.freenode.net
# Navaneeth ('''nkn__''' on irc.freenode.net)
# Nishan Naseer ('''nishan''' on irc.freenode.net)
# Nandaja Varma ('''gem''' on irc.freenode.net)

=Ideas for Google Summer of Code 2014=
* Please Read the [http://wiki.smc.org.in/SoC/2014#FAQ FAQ]

If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in student projects mailing list]>

=Projects with confirmed mentors=

== A spell checker for Indic language that understands inflections ==

'''Project''':

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''
* '''Expertise required''': Average level understanding of grammar system of at least one Indian language
* '''Complexity''': Advanced
* '''Mentor''': Santhosh Thottingal

==Indic rendering support in ConTeXt==

'''Project''':

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

* '''More Details''': A partially working patch by Rajeesh for MKIV lua code is available. ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
Generate the output using command
texexec --xetex <file.tex>
* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]'''
* '''Expertise required''': Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.
* '''Complexity''' : Advanced
* '''Mentor''' : Rajeesh K Nambiar

==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==

'''Project''':

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

* '''Background Reading'''
** [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
** [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
** http://www.speech.cs.cmu.edu/
** http://cmusphinx.sourceforge.net/wiki/tutorial
** [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
** [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza, M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
** [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour
** [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao
* '''Mentor''': Deepa P. Gopinath

==Silpa based==

===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===

'''Project''':

Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.
* '''Expertise required''': Python , Flask , Jinja , HTML, Javascript
* '''Mentor''' : Vasudev/Jishnu

===Android SDK for Silpa===

'''Project''':

Port possible Silpa modules to java and create SDK so that other developers can use this for their apps. Modules like Indic Render, Transliteration, Payyas has really good potential in android because of the fragmentation exists in Android and lack for proper Indic support. This SDK will help developers to support their Indic app in wide range of android devices.

* '''Expertise required''': Java, Android, Python
* '''Mentor''' : Jishnu/Hrishikesh/Aashik

===Converting indic processing modules currently in SILPA into javascript modules library===

'''Project''':

Port some of the silpa algorithms to node modules. Several modules, alogorithms in SILPA project is done in python now. But porting them to javascript helps developers. For example, cross language transliteration can be done javascript too if we port the algorithm and transliteration rules. Similarly the approximate search can be ported. A flexibile fuzzy search on the web pages will be possible if we have the algorithm in javascript.

Proposed javascript module pattern is https://github.com/umdjs/umd

Student proposals should have a list of alogorithms planning to port, planned demo applications, planned documentation details, and publishing details(Example: npm registry)

* '''Expertise required''': javascript, python
* '''Mentor''' : Jishnu

=== Improving cross language transliteration system. ===

'''Project''':

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

* '''Expertise required''':python
* '''Mentor''' : Vasudev/Jishnu

=== Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===

'''Project''':

'''Internationalize SILPA''' :-
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.

'''Improve the webfonts ''' :-
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* reate a repo as per jquery.webfonts specification
* Provide a clean api so that other websites can use our webfonts in their websites
* Document the usage
* Provide font preview and download options
* **This is partly done**.

* '''More Details'''
** [https://github.com/wikimedia/jquery.i18n jquery.i18n]
** [https://github.com/wikimedia/jquery.ime jquery.ime]
** [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]
* '''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts
* '''Mentor''' : Jishnu/Vasudev

==Language filter for diaspora==

Diaspora is a Free Software, federated social networking platform. Diaspora users post in many languages. When people use more than one language in their posts, it is inconvenient for people who don't understand a language. This task is to tag every post with languages used in the post, ideally detected automatically, but with an option to override it. Once each post has a language tag, people should be able to choose their preferred language and posts in other languages should be hidden by default. Also provide an option to translate posts and comments.

* '''Expertise required''': Ruby on Rails
* '''Mentor''': Pirate Praveen, Ershad K

==Varnam Based==

Varnam is a cross-platform predictive transliterator for Indian languages. It works mostly like Google's transliterate, but shows key differences in the way word tokenization is done. It has a learning system built in which allows Varnam to make smart predictions.

There are varnam clients available as [https://addons.mozilla.org/en-US/firefox/addon/varnam-transliteration-base/ Firefox]] & [https://chrome.google.com/webstore/detail/varnam-ime/abcfkeabpcanobhdmcmdabejaamephaf Chrome addon] and an [https://gitorious.org/varnamproject/libvarnam-ibus/source/d939adf50024013902c27310c03ef21a9210cdcb IBus engine].

To try out Varnam, navigate to [http://varnamproject.com/editor[http://varnamproject.com/editor]]. Currently it support Hindi and Malayalam.

===Improvements to the REST API===

'''Project''':

This includes rewrite of the current implementation in `golang` and add support for WebSockets to improve the input experience. This also
includes making scripts that would ease embedding input on any webpage. All the changes done will go live on[1]

* '''Expertise required''': Basic understanding of golang and C
* '''Complexity''': Advanced
* '''Mentor''': Navaneeth S

===Improve the learning system===

'''Project''':

The main goal of this is to improve how varnam tokenizes when learning words. Today, when a word is learned, varnam takes all the possible prefixes into account and learn all of them to improve future suggestions. But sometimes, this is not enough to predict good suggestions. An improvement is suggested which will try to infer the base form of the word under learning.

* '''Expertise required''': Knowledge in C
* '''Complexity''': Medium
* '''Mentor''': Navaneeth S

=== Word corpus synchronization ===

'''Project''':

Create a cross-platform synchronization tool which can upload/download the word corpus from offline IMEs like varnam-ibus[2]. This helps to build the online words corpus easily.

* '''Expertise required''': Knowledge in C/golang
* '''Complexity''' : Medium
* '''Mentor''': Navaneeth S

* [1]: http://www.varnamproject.com
* [2]: https://gitorious.org/varnamproject/libvarnam-ibus/

==Adding Braille Keyboard layouts for Indian Languages to m17n Library==

'''Project''':

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

* '''More Details'''
** http://www.acharya.gen.in:8080/disabilities/bh_brl.php
** http://en.wikipedia.org/wiki/Bharati_Braille
** http://www.nongnu.org/m17n/
* '''Mentor''': Anivar Aravind

=Projects with unconfirmed mentors=

==Grandham ==

=== Adding MARC21 import/export feature in Grandham ===

We need a feature in Grandham to import and parse data from MARC21 documents. We should also be able to export existing data in MARC21.

'''Expertise required''': Knowledge in Ruby/Ruby on Rails

'''Complexity''' : High

* [1]: http://dev.grandham.org
* [2]: https://github.com/smc/grandham

Automated rendering testing

2014-02-21T19:31:24Z

Nandaja: Created page with "A framework to test the correctness of output by rendering engines. Files required to test rendering using this framework: (Assuming the user has harfbuzz installed. If not..."

A framework to test the correctness of output by rendering engines.

Files required to test rendering using this framework:

(Assuming the user has harfbuzz installed. If not build it from: https://github.com/behdad/harfbuzz)

Files required to test rendering using this framework:

Test cases file Reference file for a specific font File with rendering outputs by engines Font file in ttf format Create a test cases file that consists of all the words that you wish to test the rendering for. Here is a sample test cases file created for Malayalam lamguage: https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/ml-test-data/ml-test-cases.txt

Along with this create the reference file that contains the correct glyph names of the words in the test cases file in a particular font. The framework assumes that the glyph names are in the following format: [glyph_name1,glyph_name2,glyph_name3,..,glyph_nameN]

Now if the word has more than one correct rendering, provide the next correct one along with this seperated by a semi colon. For eg: [glyph_name1,glyph_name2,glyph_name3,..,glyph_nameN];[glyph_name1,glyph_name2...,glyph_nameN];.. Here is the reference file for the above mentioned test cases file in the font Rachana: https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/ml-test-data/lohit-ml-test-data/lohit-glyph.txt

Now the file with rendering outputs. If the engine you are testing for is Harfbuzz, you can create this file using the following command:

cat ml-test-data.txt | hb-shape /path/to/Font.ttf > output.txt

If that is not the case, you will have to create it for the font you wish and the rendering of each word must be in the form: [glyph_name1|glyph_name2|..] Here is the harfbuzz rendering of the above mentioned test cases file in font Rachana: https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/ml-test-data/lohit-ml-test-data/hb_lohit_rendering.txt

Now that you have all the necessary files, write these data to an .ini file for the main script to read. Here is the structure: [main] Reference-file: Rendered-output: Font-file: Test-cases-file: Output-file: Shaping-engine: Out of these, Reference-file and Rendered-output are mandatory. Comment out all the other lines, if a result file is not necessary. If the shaping-engine is harfbuzz, then the output file will also have harfbuzz rendered image of every word. These images will be stored inside a directory names 'hb_images'. Here is a sample: https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/ml-test-data/lohit-ml-test-data/Lohit-ml.ini

Now to test, run the script rendering_test.py passing the name of the .ini file as a parameter. For example:

./rendering_test.py Lohit-ml.ini

(In the repo one can find samples in four Malayalam fonts and one Devanagari font. Test cases file for Malayalam being https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/ml-test-data/ml-test-cases.txt and that for Devanagari being https://github.com/nandajavarma/Automated-Rendering-Testing/blob/master/devanagari-test-data/devanagari_test_cases.txt.)

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/SMC Camp/പാലക്കാട്

2013-09-26T16:39:20Z

Nandaja:

പാലക്കാട് ജില്ലയില്‍ ഈ ആഴ്ചയും അടുത്തതുമായി നടക്കാന്‍ സാധ്യതയുള്ളതും ഉറപ്പായതുമായ് ക്യാമ്പുകള്‍.

== എന്‍. എസ്സ്. എസ്സ്. എഞ്ചിനീയറിങ് കോളേജ് ==

*തീയതി: 28/09/13
*സമയം: രാവിലെ 10 - 12
*നയിക്കുന്നവര്‍: ഇര്‍ഷാദ്, അര്‍ജുന്‍, അല്‍ഫാസ്

== പ്രൈം കോളേജ് ഫോര്‍ വിമന്‍ ==

*തീയതി: ഒക്ടോബര്‍ 3 (തീര്‍ച്ചയില്ല)
*സമയം:
*നയിക്കുന്നവര്‍:

== ശ്രീകൃഷ്ണപുരം ഗവ. എഞ്ചിനീയറിങ് കോളേജ് ==

*തീയതി: ഒക്ടോബര്‍ 7 അല്ലെങ്കില്‍ 8 (തീര്‍ച്ചയില്ല)
*സമയം:
*നയിക്കുന്നവര്‍:

== പി.എം.ജി ഹയര്‍ സെക്കണ്ടറി സ്കൂള്‍ ==

*തീയതി: ഒക്ടോബര്‍ 3 (തീര്‍ച്ചയില്ല)
*സമയം:
*നയിക്കുന്നവര്‍:

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/SMC Camp/പാലക്കാട്

2013-09-26T16:37:54Z

Nandaja: Created page with "പാലക്കാട് ജില്ലയില്‍ ഈ ആഴ്ചയും അടുത്തതുമായി നടക്കാന്‍ സാധ്യതയു..."

പാലക്കാട് ജില്ലയില്‍ ഈ ആഴ്ചയും അടുത്തതുമായി നടക്കാന്‍ സാധ്യതയുള്ളതും ഉറപ്പായതുമായ് ക്യാമ്പുകള്‍.

== എന്‍. എസ്സ്. എസ്സ്. എഞ്ചിനീയറിങ് കോളേജ് ==

തീയതി: 28/09/13
സമയം: രാവിലെ 10 - 12
നയിക്കുന്നവര്‍: ഇര്‍ഷാദ്, അര്‍ജുന്‍, അല്‍ഫാസ്

== പ്രൈം കോളേജ് ഫോര്‍ വിമന്‍ ==

തീയതി: ഒക്ടോബര്‍ 3 (തീര്‍ച്ചയില്ല)
സമയം:
നയിക്കുന്നവര്‍:

== ശ്രീകൃഷ്ണപുരം ഗവ. എഞ്ചിനീയറിങ് കോളേജ് ==

തീയതി: ഒക്ടോബര്‍ 7 അല്ലെങ്കില്‍ 8 (തീര്‍ച്ചയില്ല)
സമയം:
നയിക്കുന്നവര്‍:

== പി.എം.ജി ഹയര്‍ സെക്കണ്ടറി സ്കൂള്‍ ==

തീയതി: ഒക്ടോബര്‍ 3 (തീര്‍ച്ചയില്ല)
സമയം:
നയിക്കുന്നവര്‍:

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:18:30Z

Nandaja: /* Programs */

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues.
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി 2013/സാങ്കേതികപ്രദര്‍ശനം/en|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===
SMC general body meeting. Discussing about the future plans of SMC.

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:14:27Z

Nandaja:

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues.
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി 2013/സാങ്കേതികപ്രദര്‍ശനം/en|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

വാര്‍ഷികപൊതുപരിപാടി 2013/സാങ്കേതികപ്രദര്‍ശനം/en

2013-09-17T08:14:00Z

Nandaja: Redirected page to സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/സാങ്കേതികപ്രദര്‍ശനം/en

#Redirect [[സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/സാങ്കേതികപ്രദര്‍ശനം/en]]

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:05:39Z

Nandaja:

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues.
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി_2013/സാങ്കേതികപ്രദര്‍ശനം|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:04:24Z

Nandaja:

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues.
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[http://wiki.smc.org.in/%E0%B4%B8%E0%B5%8D%E0%B4%B5%E0%B4%A4%E0%B4%A8%E0%B5%8D%E0%B4%A4%E0%B5%8D%E0%B4%B0%E0%B4%AE%E0%B4%B2%E0%B4%AF%E0%B4%BE%E0%B4%B3%E0%B4%82%E0%B4%95%E0%B4%AE%E0%B5%8D%E0%B4%AA%E0%B5%8D%E0%B4%AF%E0%B5%82%E0%B4%9F%E0%B5%8D%E0%B4%9F%E0%B4%BF%E0%B4%99%E0%B5%8D%E0%B4%99%E0%B4%BF%E0%B4%A8%E0%B5%8D%E0%B4%B1%E0%B5%86_%E0%B4%92%E0%B4%B0%E0%B5%81_%E0%B4%B5%E0%B5%8D%E0%B4%AF%E0%B4%BE%E0%B4%B4%E0%B4%B5%E0%B4%9F%E0%B5%8D%E0%B4%9F%E0%B4%82/%E0%B4%B8%E0%B4%BE%E0%B4%99%E0%B5%8D%E0%B4%95%E0%B5%87%E0%B4%A4%E0%B4%BF%E0%B4%95%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A6%E0%B4%B0%E0%B5%8D%E2%80%8D%E0%B4%B6%E0%B4%A8%E0%B4%82/en|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:01:38Z

Nandaja: /* Publicity programs */

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues.
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി_2013/സാങ്കേതികപ്രദര്‍ശനം|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:00:51Z

Nandaja: /* Programs */

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues. *[[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി_2013/സാങ്കേതികപ്രദര്‍ശനം|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T08:00:12Z

Nandaja:

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues. *[[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി_2013/സാങ്കേതികപ്രദര്‍ശനം/en|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/en

2013-09-17T07:59:39Z

Nandaja: Created page with "thumb| Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October wit..."

[[File:Smc-logo.png|thumb|]]
Twelfth anniversary programs of Swathanthra Malayalam Computing will take palce in Thrissur, Sahithya Academy Hall on 14th and 15th of October with wide variety of programs.

== Discussions ==
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/ഐസാറ്റ്_ഓഗസ്റ്റ്29|Ernakulam ISAT - 29th August, Thursday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/കൂടിയാലോചനകള്‍/പിജി_സെന്റര്‍_ഓഗസ്റ്റ്28|Thrissur PG center - August 28, Wednesday]]
* [[വാര്‍ഷികപൊതുപരിപാടി_2013/IRC_25-08-2013|IRC Meeting - August 25, Sunday]]

== Publicity programs ==
To introduce SMC, at least one program is to be organized in every district with engineering colleges as venues. *[[വാര്‍ഷികപൊതുപരിപാടി_2013/SMC_Camp|SMC camps]]

== Social media ==

== Programs ==
[[വാര്‍ഷികപൊതുപരിപാടി_2013/സാങ്കേതികപ്രദര്‍ശനം|Technical exhibitions]]
[[വാര്‍ഷികപൊതുപരിപാടി 2013/കാര്യപരിപാടി/വിക്കിസംഗമം|Wiki meetup]]

===Day 1===

===Day 2===
Inauguration 9 AM

Computer "Harisree"

* Session 1 History of Malayalam computing
* Session 2 Introducing main tools
Scribus

C-DAC

Parallel session: panel discussion.

===Day 3===

Presenting GSoC projects 9 AM

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/സാങ്കേതികപ്രദര്‍ശനം/en

2013-09-17T07:18:09Z

Nandaja:

The technical exhibitions for the twelfth anniversary programs of Swathanthra Malayalam Computing to be held on 14th and 15th of October at Thrissur, Sahithya Academy hall will be compiled in this page.

Nandaja, Arjun and Manoj are mainly in charge of these programs.

=== Malayalam Computing Milestones/History ===
Details has to be collected and documents. Posters are to be made. --> Anivar, Sooraj

=== Swathanthra Malayalam Computing ===
Milestones/history of SMC so far.

=== Freedom toaster + Install fest ===
Responsibility to get freedom toaster ready --> Anish, Sooraj

=== Fonts ===
(Fonts maintained by SMC, other free software fonts, Battathiri's Calligraphy, Magra, typography etc will be exhibited.)
Responsibility --> Arjun, Hiran Venugopalan

=== Wiki Grandhasala ===
Responsibility --> Manoj

=== Publication ===
(Books printed in original script. A similar idea was presented by Rachana Aksharavedi in last year's book fest.)
Responsibility --> Hussain Sir, Hrishikesh

=== Silpa project ===
Swathanthra Indian Language Processing Application
In charge --> Anish, Hrishikesh

=== Typing tools ===
(Varamozhi, keyman, Swanalekha, Inscript, Narayam, Lalitha, Varnam, ULS etc.)
Everything has to be documented neatly and made presentable
In charge --> Balu

=== Localization hut ===
(Introducing different localization projects)
Responsibility --> Ani peter

=== Dictionary - Olam ===
Responsibility --> Kailash Nath

=== Grandhasoochi ===
Responsibility --> Ershad, Anivar

=== Dhwani ===
Will be made presentable by Santhosh/Kavya

=== Android Malayalam ===
Responsibility --> Jishnu

=== Malayalam Mapping (GIS) ===
Responsibility ---> Jaisen Nedumpala

=== Payyans and Chathans (ASCII to Unicode) ===
Responsibility --> Manoj

=== Malayalam Spellchecker ===
Responsibility --> Santhosh/Kavya

=== Malayalam Capcha ===
Responsibility --> Hrishikesh

=== Letter splitter ===

=== Fortune Malayalam ===

=== Hyphenation ===

=== Paralperu ===

=== Sharika ===

=== Malayalam Matrix - Screen saver ===
No much work.

=== Libre Office auto correct ===
Responsibility --> Manoj

=== Orca ===
Sathyasheelan sir will help

=== Sharada Braille Writer ===
Nalin will help

=== Scribus Malayalam ===
Anil will help

=== OCR ===

=== Machine translation ===
Responsibility --> Aboobacker

=== Scan tyler demo ===

=== Help desk ===
Of free software user group ilegtvm, ilegcochin..)
Coordination --> Soorah Kennoth

=== Irumbanam school ===
Tux paint + Subtitle activities by students from Irumbanam School
Responsibility --> Sanal Kumar Sir

=== IT@School ===
IT@School has to be officially invited

=== Wikipedia + other wiki projects ===
A wiki page has been created for the event.

=== Bloggers' union ===

=== M3DB + Eenam ===
Responsibility --> Manoj

=== Malayalam Subtitles M-zone ===

=== Logopedia ===

=== Sayahnna ===
Manoj

=== Chamba ===

=== Diaspora and Savepoddery ===
Substitute social network.

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/സാങ്കേതികപ്രദര്‍ശനം/en

2013-09-17T07:17:42Z

Nandaja: /* = Fonts */

The technical exhibitions for the twelfth anniversary programs of Swathanthra Malayalam Computing to be held on 14th and 15th of October at Thrissur, Sahithya Academy hall will be compiled in this page.

Nandaja, Arjun and Manoj are mainly in charge of this programs.

=== Malayalam Computing Milestones/History ===
Details has to be collected and documents. Posters are to be made. --> Anivar, Sooraj

=== Swathanthra Malayalam Computing ===
Milestones/history of SMC so far.

=== Freedom toaster + Install fest ===
Responsibility to get freedom toaster ready --> Anish, Sooraj

=== Fonts ===
(Fonts maintained by SMC, other free software fonts, Battathiri's Calligraphy, Magra, typography etc will be exhibited.)
Responsibility --> Arjun, Hiran Venugopalan

=== Wiki Grandhasala ===
Responsibility --> Manoj

=== Publication ===
(Books printed in original script. A similar idea was presented by Rachana Aksharavedi in last year's book fest.)
Responsibility --> Hussain Sir, Hrishikesh

=== Silpa project ===
Swathanthra Indian Language Processing Application
In charge --> Anish, Hrishikesh

=== Typing tools ===
(Varamozhi, keyman, Swanalekha, Inscript, Narayam, Lalitha, Varnam, ULS etc.)
Everything has to be documented neatly and made presentable
In charge --> Balu

=== Localization hut ===
(Introducing different localization projects)
Responsibility --> Ani peter

=== Dictionary - Olam ===
Responsibility --> Kailash Nath

=== Grandhasoochi ===
Responsibility --> Ershad, Anivar

=== Dhwani ===
Will be made presentable by Santhosh/Kavya

=== Android Malayalam ===
Responsibility --> Jishnu

=== Malayalam Mapping (GIS) ===
Responsibility ---> Jaisen Nedumpala

=== Payyans and Chathans (ASCII to Unicode) ===
Responsibility --> Manoj

=== Malayalam Spellchecker ===
Responsibility --> Santhosh/Kavya

=== Malayalam Capcha ===
Responsibility --> Hrishikesh

=== Letter splitter ===

=== Fortune Malayalam ===

=== Hyphenation ===

=== Paralperu ===

=== Sharika ===

=== Malayalam Matrix - Screen saver ===
No much work.

=== Libre Office auto correct ===
Responsibility --> Manoj

=== Orca ===
Sathyasheelan sir will help

=== Sharada Braille Writer ===
Nalin will help

=== Scribus Malayalam ===
Anil will help

=== OCR ===

=== Machine translation ===
Responsibility --> Aboobacker

=== Scan tyler demo ===

=== Help desk ===
Of free software user group ilegtvm, ilegcochin..)
Coordination --> Soorah Kennoth

=== Irumbanam school ===
Tux paint + Subtitle activities by students from Irumbanam School
Responsibility --> Sanal Kumar Sir

=== IT@School ===
IT@School has to be officially invited

=== Wikipedia + other wiki projects ===
A wiki page has been created for the event.

=== Bloggers' union ===

=== M3DB + Eenam ===
Responsibility --> Manoj

=== Malayalam Subtitles M-zone ===

=== Logopedia ===

=== Sayahnna ===
Manoj

=== Chamba ===

=== Diaspora and Savepoddery ===
Substitute social network.

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/സാങ്കേതികപ്രദര്‍ശനം/en

2013-09-17T07:17:23Z

Nandaja: Created page with "The technical exhibitions for the twelfth anniversary programs of Swathanthra Malayalam Computing to be held on 14th and 15th of October at Thrissur, Sahithya Academy hall wil..."

The technical exhibitions for the twelfth anniversary programs of Swathanthra Malayalam Computing to be held on 14th and 15th of October at Thrissur, Sahithya Academy hall will be compiled in this page.

Nandaja, Arjun and Manoj are mainly in charge of this programs.

=== Malayalam Computing Milestones/History ===
Details has to be collected and documents. Posters are to be made. --> Anivar, Sooraj

=== Swathanthra Malayalam Computing ===
Milestones/history of SMC so far.

=== Freedom toaster + Install fest ===
Responsibility to get freedom toaster ready --> Anish, Sooraj

=== Fonts ==
(Fonts maintained by SMC, other free software fonts, Battathiri's Calligraphy, Magra, typography etc will be exhibited.)
Responsibility --> Arjun, Hiran Venugopalan

=== Wiki Grandhasala ===
Responsibility --> Manoj

=== Publication ===
(Books printed in original script. A similar idea was presented by Rachana Aksharavedi in last year's book fest.)
Responsibility --> Hussain Sir, Hrishikesh

=== Silpa project ===
Swathanthra Indian Language Processing Application
In charge --> Anish, Hrishikesh

=== Typing tools ===
(Varamozhi, keyman, Swanalekha, Inscript, Narayam, Lalitha, Varnam, ULS etc.)
Everything has to be documented neatly and made presentable
In charge --> Balu

=== Localization hut ===
(Introducing different localization projects)
Responsibility --> Ani peter

=== Dictionary - Olam ===
Responsibility --> Kailash Nath

=== Grandhasoochi ===
Responsibility --> Ershad, Anivar

=== Dhwani ===
Will be made presentable by Santhosh/Kavya

=== Android Malayalam ===
Responsibility --> Jishnu

=== Malayalam Mapping (GIS) ===
Responsibility ---> Jaisen Nedumpala

=== Payyans and Chathans (ASCII to Unicode) ===
Responsibility --> Manoj

=== Malayalam Spellchecker ===
Responsibility --> Santhosh/Kavya

=== Malayalam Capcha ===
Responsibility --> Hrishikesh

=== Letter splitter ===

=== Fortune Malayalam ===

=== Hyphenation ===

=== Paralperu ===

=== Sharika ===

=== Malayalam Matrix - Screen saver ===
No much work.

=== Libre Office auto correct ===
Responsibility --> Manoj

=== Orca ===
Sathyasheelan sir will help

=== Sharada Braille Writer ===
Nalin will help

=== Scribus Malayalam ===
Anil will help

=== OCR ===

=== Machine translation ===
Responsibility --> Aboobacker

=== Scan tyler demo ===

=== Help desk ===
Of free software user group ilegtvm, ilegcochin..)
Coordination --> Soorah Kennoth

=== Irumbanam school ===
Tux paint + Subtitle activities by students from Irumbanam School
Responsibility --> Sanal Kumar Sir

=== IT@School ===
IT@School has to be officially invited

=== Wikipedia + other wiki projects ===
A wiki page has been created for the event.

=== Bloggers' union ===

=== M3DB + Eenam ===
Responsibility --> Manoj

=== Malayalam Subtitles M-zone ===

=== Logopedia ===

=== Sayahnna ===
Manoj

=== Chamba ===

=== Diaspora and Savepoddery ===
Substitute social network.

User:Nandaja/GSoC 2013 Automated Rendering Testing

2013-09-09T10:05:39Z

Nandaja: /* Progress */

== '''Personal information''' ==

* '''Name''': Nandaja Varma
* '''Email Address''': <nandaja.varma AT gmail DOT com>
* '''Freenode IRC Nick''': gem
* '''University and current education''' ː BTech Computer Science, Calicut University (NSS College palakkad)
* '''Blog URL''': nandajavarma.wordpress.com

===Why do you want to work with the Swathanthra Malayalam Computing?===

Since I came to know about the activities of SMC (That would be by the starting of my second year studies at college), I wanted to be a part of this community and make some significant contributions to it. I see this as a great opportunity for it. Would like to do the same through any other means possible, as well.

===Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?===

Yes, I have recently started contributing to SMC's Gnome localization team. I make contributions to Debian community as a packager, mainly packaging Ruby gems for Debian. I also got involved in digitalization works with Malayalam Wikigrandhashala recently.

===Did you participate with the past GSoC programs, if so which years, which organizations?===

No, I did not.

===Do you have other obligations between May and August? Please note that we expect the Summer of Code to be a full time, 40 hours a week commitment ?===

I have no other obligations whatsoever between the proposed months. I will be able to make this 40 hours a week commitment GSoC.

===Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2013 program, if yes, which area(s), you are interested in?===

Yes, Most definitely. I would like to continue my contributions with the localization works as translation is one of my area of interests. I would also like to make major contributions to SMC's rendering fixing related works.

===Why should we choose you over other applicants?===

I have understanding of the rendering engine, Harfbuzz's working and I have played with a couple of scripts which basically prints the glyph index of a particular text in a particular font. As of implementing my project idea, I have good knowledge in C programming language and have good reading and writing skills in Malayalam. This would definitely help me in creating the list of base glyph words for this project. Also I also have quite a clear knowledge on test rendering stack and its constituent modules.

== Project Description ==

===An Overview of your proposal===

Harfbuzz is an opensource development library for shaping Unicode text, specifically complex scripts. Developing an automated mechanism to test what has been rendered by harfbuzz for different Indic languages is the main objective of this project. As of now, there is no actual mechanism to check if Harfbuzz is rendering the text correctly. As harfbuzz is a very efficient, widely used and undoubtedly about to be used for a long time to come, the project is highly relevant. The proposed system has the ability to test renderings in different indic languages using different fonts.

===The need you think it fulfills===

Implementation of the above mentioned idea can make sure that what is being rendered be harfbuzz is actually correct. It would make it easier for developers or users if such a mechanism exists because now the only way to do is manually testing it, which can be time consuming and is error prone. Also, anyone can get the renderings tested even if she knows the particular language that is being rendered or not.

===Any relevant experience you have===

I have decent knowledge in C programming language, in which Harfbuzz is implemented. I am quite familiar with harfbuzz architecture and its renderings. Also my knowledge on test rendering stacks, glyphs and Unicode encoding would definitely help in taking me further. I also have experience in localization and digitalization works which, I hope, will help me at some points of the project.

===How you intend to implement your proposal===

Harfbuzz is a shape rendering engine for Unicode text, especially complex scripts. Harfbuzz basically offers two utilities hb-view and hb-shape for testing and viewing the rendering. hb-view gives as its output the view of the rendered unicode character based on its font, basically as an image where as hb-shape actually gives as its output the glyph index of that particular character based on its font. For example if we give the command:
hb-view Rachana.ttf മലയാളം , We get an output like this: [m1=0+1046|l3=1+1462|y1=2+1624|uni0D3E=2+826|lh=4+1134|uni0D02=4+856], which is basically the glyph index of the word 'മലയാളം'. Glyphs represent the shapes that characters can have when they are rendered or displayed. Opentype is the prominent font standard used today. Opentype font technology deals with glyphs where as Unicode deals with characters. Glyph indices are mapping between a Unicode character to its corresponding glyph(s). So Glyph indices are one of the most important things to be dealt with when it comes to rendering.

So, to implement this idea of making the testing automated, What will be done is evaluating the output of hb-shape functionality. As it shows the glyph index of any word that we give as input, we can check this value for correctness. So the methodology to be followed to check this for correctness will be as follows:
Create a baseline glyph words list that consist of a word and it's corresponding glyph index for each font. This must contain the correct rendering of each of the words specified. We will have to create this particular list for every indic language for which we are planning to implement this testing. For creating this table, we can make use of fontforge, which is a font editor that can be used to create fonts. So we will get the layout of each character in this application. We can create a baseline glyph words table using the glyph index data that we can fetch from fontforge for different indic languages. But, obviously, we cannot create a table with every single character or character combinations possible, which is difficult as well as less efficient as it will drastically affect the comparing procedure. So special care should be taken to create a table that consists of most important characters that might go wrong and should not, special case characters , etc. We have to intelligently pick the words or character combinations which can significantly decrease the total number of entries in this list, which presumably can be entered into a database or a more efficient database like a hash table or a trie can be used to fastly search for the data while providing our list as a separate text file.

Then script should be written, in C, to accept hb-shape output as input and then check it against out baseline glyph word, find the exact matching word as see if the glyph indices match. If it doesn't, then that can be flagged as being incorrectly rendered. Also, it might so happen that the comparing words do not appear in the list we provide. Here comes the efficiency of the words we have chosen. Either we can assume that the particular word or character is very rarely used or assume that the word input was given wrong. If the hit of a same word happens more than a certain number of times we can say that out assumptions were wrong and we can think of a mechanism to get this particular word flagged and then add its corresponding glyph index, as an upgrade.

Also, to interact with this proposed library, a Web front end can also be made, in PHP, to make it more user friendly rather that using the command line.

===A rough timeline for your progress with phases===
* '''Week 1 - 2''' : Learn more about Opentype and Unicode. Learn well the way usually font shapes are rendered in engines and how to they appear when we combine characters to words and what will the changes happened to the glyph indices be.
* '''Week 2 - 3''' : Create a list of words or characters in Unicode that necessarily is needed to test against with harfbuzz output. Most preferably, this one in Malayalam. Select the words efficiently to make the whole list effective as well as concise.
* '''Week 4 - 5''' : Start coding for the application with the collected data as the baseline.
* '''week 6''' : Test the code against some Harfbuzz Malayalam Renderings against the provided list and make changes accordingly to make it perform perfect and faster.
* '''Week 7 - 8''' : Create the baseline glyph word index for as many indic languages as possible, although there is time and linguistic barriers. Planning to collect it at least for Hindi.
* '''Week 9''' : Creating the web front end for the application.
* '''Week 10''' : Testing, reviewing and documentation.

===Tell us something about you have created===

I have created a prototype search engine, using Hadoop in the back end and python for further ranking processes with a web page as an interface.

===Have you communicated with a potential mentor? If so who?===

Yes, I have communicated with the mentor Rajeesh K Nambiar.

===SMC Wiki link of your proposal===

[http://wiki.smc.org.in/index.php?title=User:Nandaja/GSoC_2013_Automated_Rendering_Testing SMC wiki link ]

==Progress==

===20/07/2013===

* Started coding for the project three days ago.
* As for my current developing code the inputs needed are a file with a list of words/characters, the rendering of which are to be tested. Along with that, the correct glyph names of the words/characters. This is extracted manually from font forge at the moment. Eg: ക[k1]
* The next file needed is a file with output of harfbuzz renderings of all the words/characters chosen for testing. A separate script is written for this purpose which is to be executed on the test words file which will yield an output of the form: ക[k1=0+1588]. This is actually the output of hb-shape command. The value following the = will be ignored for now.
* In the testing script, the first file will be opened, read the characters appearing before [, i.e our word/character. Then until the ] sign is encountered the strings(elimination =, + and digits) will be added to an array. The same character will be looked up in the harfbuzz rendered outputs' file and the glyph names will be similarly collected in an array.
* Then compare the two strings. If both are the same we enter a value 0 to a check array. Check[i] = 0. Otherwise, check[i] = 1.
* The last two steps are repeated until end of the file is encountered.
* After that, we look up the check array. All the words listed at ith position with check[i] = 1 will be stored on to a separate file.
* Finally we can run another script on this results file to get the hb-view outputs of these words to get a better understanding of the rendering mistake.
* Further corrections to the above algorithm will be updated periodically.

===29/07/2013===

Coding period for GSoC has started the past week and I have been working on a very simple implementation of the proposal in C and two tiny bash scripts. My code is available here: https://gitlab.com/gem/automated-rendering-testing

The first thing to be done to test using these scripts is create a file that contains a set of words to be tested to see if their rendering is correct. Here I have taken a sample test data file created by SMC a while ago (ml-harfbuzz-testdata,txt). Now pass this file through the script render_test.sh along with the necessary font file. That is:

./render_test.sh ml-harfbuzz-testdata.txt /path/to/fontfile

This will create a file named rendered_glyphs.txt that contains the output of hb-shape function of harfbuzz, i.e. the glyph name followed by some additional numbers (which will be ignored for now).

Now create a file that contains the actual glyph names of the words in the the test data wordfile. I got the data from font forge. This has to be created manually and, as of now, obeying the following structure:

[glyph11,glyph12,glyph13,...,glyph1n]

[glyph21,glyph22,glyph33,....,glyph2n]

.

.

.

Also make sure that glyph names of each word is in the same order as that of the corresponding words in the test data file. I have named it orig_glyphs.txt Once this is done, we can pass the above two files through the executable of the script rendering_testing.c, say rendering_testing. That is:

./rendering_testing orig_glyphs.txt rendered_glyphs.txt

This script will compare the glyphs in order and if it find any pairs that doesn’t match, it will write to a file, result.txt, the line number in which the word appears in the test data file. Otherwise it will tell you the renderings are perfect.

Once this is done, to see the words with wrong renderings we will have to run the third script show_rendering.sh. It takes as input the result.txt file, the test data file and also the font file. That is:

./show_rendering.sh result.txt ml-harfbuzz-testdata.txt /path/to/fontfile

This script will create png images of the wrongly rendered words in the current directory.

That is all about my scripts. But the C code is very much inefficient. It even spits segmentation faults with some files. Once I make sure that I am on the right path after discussing with my mentor, I will be working on improving my algorithm and making this code better. That would be my next week’s work.

===14/07/2013===

This week I've been working on generating a baseline glyphs file for 4 fonts: Rachana, Meera, Suruma and Lohith-Malayalam. I have selected some malayalam words from harfbuzz tree and Santhosh Thottingal's test cases which I thought would be enough to test rendering problems. Then I started listing the glyph names of these files for each fonts in separate text files. To get the corresponding Unicode code point of each word, I wrote a small Java code. So I executed the script on each word, found all the code points and made 4 text files that contains the corresponding glyph names of the four fonts I mentioned earlier.

Although my mentor did tell me that it is not possible to generate glyph names automatically, I wasted more than a couple of days on a Font Forge script to make it automatically output the glyph names. But that gives the glyph name only if we click on each character, which became terribly disappointing. So instead I used it to make the baseline glyphs file in the structure I want if I click on the necessary characters. But this code is trivial as far as rendering testing is concerned and will leave it out from now (Just noting it down as it wasted a very non-trivial amount of my time ;-) ).

I have modified the main C code such that it will ask the tester which font she wants and after choosing the one she needs it will output the result based on the words I have given.

But my mentor pointed out that it looks quite messy looking at codes in 3 different languages for a single framework so I'll be re-writing my code in Python this week.

You can find my code here: https://github.com/nandajavarma/Automated-Rendering-Testing (although the README is not up-to-date)

(The above content are from my blog: http://nandajavarma.wordpress.com/)

===21/7/13===

This week my main task was to migrate my code to Python. As of now I have
implemented my algorithm in Python. Here is the link to the repo: https://gitlab.com/gem/automated-rendering-testing/tree/master

I have expanded my test cases' list a bit. Now it has 243 Malayalam words.
I have manually created files with glyph names of these test cases in four
fonts: Rachana, Meera, Suruma and Lohith-Malayalam in files names
rachana-glyph.txt, meera-glyph.txt etc. (It is still a bit buggy, so
haven't pushed the latest commit of this yet).

What the code basically does is, it will ask the tester which font she/he
wants to test in. Say it is Meera. The code will look for the reference
file which we have manually created and the file with harfbuzz renderings
of the test cases, named hb_meera_rendering.txt. This file can be created
by running harfbuzzrendering.py script with proper font files in the
current directory. The main script rendering_testing.py will scan both
these files and compare the glyph name corresponding to each word and
stores the wrongly rendered words to a new list. Finally hb-view will be
executed on the words inside this list and a file named output.png will be
generated in the same directory that pictorially represents the wrong
renderings.

One can even provide a separate test cases files (but by preparing the
reference file in specific structure) and/or a separate font (but
generating the renderings files directly by running hb-view on the test
cases) . If the font file of any of these given four fonts are being
updated, just copy the new version and execute the harfbuzzrendering.py
script. Then testing can be done as mentioned earlier.

The baseline glyph names' files aren't ready yet with complete glyph names
of all the 243 words. Will be able to complete it within 1-2 days.

===28/7/13===

The works this week has been a little slow with college exams and assignments. This is what I have done so far this week.

I have completed the list of reference files containing glyph names of 243 words from four fonts each. Fonts being: Rachana, Meera, Suruma and Lohit-Malayalaam.

The code has been modified to equip not only harfbuzz renderings but renderings from other engines line Uniscribe, provided the user will produce the output of the rendering engine herself/himself. I have created a Python package containing 2 modules each for testing and creating output. The main script automated_rendering_testing.py will make use of this package to test and give the final result. To test the framework, one can just run ./automated_rendering_testing and then provide the necessary information, when asked.

Coming to the tester, first it will compare the reference file and the rendering output. The it will create a file named result.txt containing the wrongly rendered words along with the number corresponding to the word in test cases’ file. This file is used only to create the png file of the wrongly rendered words, if the engine is harfbuzz. Other wise this file is ignored. Now the actual output is a file test_result.txt with the format:

Sl.No Word Rendering status(correct/wrong)

User can view this file, see the status and see the wrongly rendered word.
The agenda for this week is to re-write the whole code in C.
One can view code from here: https://github.com/nandajavarma/Automated-Rendering-Testing

===11/06/2013===

The following modifications were asked to be made on the existing framework by my mentor after a Hangout session as part of the evaluations:

1. Modify the comparison algorithm so as to show positive results for the words with multiple correct renderings - This modification is made. Now, the user can give multiple glyph names separated by comma in the reference file and if the rendering matches any one of these, the framework will return a positive response.

2. Modify the reference glyph file, adding the glyph names of words with multiple correct renderings. Also some corrections were asked to be made in the existing reference file.

3. Modify the framework such that the user can even test by giving the file names as parameters. This one needs a little more work as I didn't give options in argument parser for all the necessary file inputs. Will update this soon.

Along with these some minor fixes were asked to be done on the script and all those are taken care of.

As for the further developments, planned to create a web interface for this framework. I am trying to create this interface using Flask and I am currently working on it.
After that, the framework will be implemented in C. I have added a partially working implementation of this in the repo.
After the completion of all these, if time permits, references for other fonts are also planned to be made.

Find my code here: https://gitlab.com/gem/automated-rendering-testing/tree/master

===17/08/2013===

I have changed the framework interface from its previous form, although the previous front end automated_rendering_testing.py is still present in the repo. Now the new interface, rendering_testing.py, need all the file names to be provided as command line arguments. The user gets the convenience of using the tab completion this way. The user will have to give as command line arguments 6 files (font file, test cases file, reference file, rendering output and files to store output) and an optional directory name(if the engine is harfbuzz).

If the rendering engine is harfbuzz, user can run the script generate_hb_rendering.py along with the test cases file and font file as parameters, to create the rendered output file. If that is not the case, the user will have to create this file as well in the prescribed form.

Now, the algorithm that actually test the rendering was a bit buggy and was giving certain wrong outputs for words with multiple rendering engines and I have cleared this error. This feature gives correct output now for the files I tried it with.
The next thing I am working on is the web interface and I am using Flask framework. Will make this code public as soon as I get the script running from the page.
Find the code here: https://gitlab.com/gem/automated-rendering-testing/tree/master
More info in the README

===25/08/2013===

The work of mine has been correcting the reference glyph files and developing a web interface for the proposed framework. I had tried and made the reference files least buggy as possible. I have gone through the glyph names of almost all the 243 words in 4 fonts. I had to invest a lot of time on this especially due to one minor misunderstanding of mine on the multiple correct renderings of the words. And I hope it will get much refined after Rajeeshettan proof read it for 2 fonts as he has suggested.
(I have changed the renderings of words with repham in Rachana such that the dotreph comes first. So words like these http://troll.ws/image/2e3a872e, http://troll.ws/image/469dd87a, http://troll.ws/image/5838dbec although looks correct, will be in the wrongly rendered words list by harfbuzz.)

The next part of this weeks work was developing the web interface (Excuse my poor design, I am cleaning it up as I write). It doesn't actually spits output to the user now or doesn't make it easier for the user to open files. I am hoping to make it run the script well in a week's time and don't think it is ready yet for the review. So I would like another week to make it ready for reviewing.

And finally about the C code I have added to the repo. I will start working on a new code in C++ once I am done with the webpage as I find the present code massively buggy and really inefficient. I hope I'll be able to update it the week after next.

My code here: https://gitlab.com/gem/automated-rendering-testing/tree/master

===9/09/2013===

Here is the present status of the project.

* The testing framework now can evaluate words with multiple correct renderings, provided the correct renderings are provided in the reference file separated by semi colon.

* Reference glyph for both Rachana and Meera has been updated as per the latest updates (changes in glyph names) in the upstream.

* Reference for Devanagari font is being added to the repo.

Present status of the framework is:

* rendering_test.py can accept up to 7 inputs, which being the test cases file, reference file, rendered output file, font file, output file, error file and a directory name.
* Of this everything but reference file and rendering output are optional.
* Output will be produced as per the parameters passed.
* pep-8 errors reported before has been cleared.

By the end of this week, I am planning to finish:

* Complete Devanagari references
* The immediate next priority being C++ implementation of the code, I will be working on that.
* Proof read Suruma and Lohith-Malayalam test cases

Once this is all done, I will work on the web interface.

Find my code here: https://github.com/nandajavarma/Automated-Rendering-Testing

സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/SMC Camp/Level1

2013-08-29T23:03:24Z

Nandaja: /* നന്ദജ */

ക്യാമ്പുകള്‍ക്ക് ക്ലാസ് എടുക്കാനായി മുന്നോട്ട് വന്നിട്ടുള്ളവര്‍ താഴെപ്പറയുന്നവരാണ്.
{{TOC hidden}}
== ഇര്‍ഷാദ് ==

== അനീഷ് ==

== ബാലശങ്കര്‍ ==
* ചെറിയൊരു program/code പ്രവര്‍ത്തിച്ച് കാണിക്കും.
* അതിന്റെ പ്രവര്‍ത്തനം വിവരിക്കും
* അതിന്റെ source code വിശദീകരിക്കും
* അതിലുള്ള മറ്റു സാധ്യതകള്‍ വിദ്യാര്‍ത്ഥികളെ കൊണ്ട് സ്വയം കണ്ടുപിടിച്ച് നടപ്പില്‍ വരുത്താന്‍ സഹായിക്കും.
*ആ പ്രവര്‍ത്തത്തില്‍ ഇത്രയും ഉള്‍പ്പെടുന്നു.
** Bug reporting
**Error diagnosis
**Error correction
**സാഹയം തേടല്‍
**Bug handling
**Version control
**Commeting
**Documentation
**Variable naming
**Optimal coding

ഉദാഹരണങ്ങള്‍ക്ക് ഉപയോഗിക്കുന്ന
*ഭാഷ: python
*ഉപയോഗിക്കുന്ന പ്രൊജക്റ്റ്: ശില്പ (മിക്കവാറും)
*ബഗ് റിപ്പോര്‍ട്ടിങ്ങ് ടൂള്‍:Bugzzilla
* version controlling: git
* Editor: vi/nano/gedit

Syatem Requirements:
OS: Debinan wheezy
Softwares: git, python, Bugzzilla

== ശ്രീഹരി ==

== നന്ദജ ==

*ലിനക്സിന്റെ സഹായത്തോടെ എങ്ങിനെ നമ്മുടെ വര്‍ക്കുകള്‍ ലഘൂകരിക്കാം.
*വളരെ ഉപയോഗപ്രതമായ യുണിക്ക്സ് കമാന്റുകള്‍ പരിചയപ്പെടുത്തുക.
*കൗതുകമേറിയ ചെറിയ സ്ക്രിപ്റ്റുകള്‍ ഗ്രപ്പും മറ്റുമുപയോഗിച്ച് ഡെമോണ്‍സ്ട്രേറ്റ് ചെയ്ത് കമാന്റ് ലൈനിന്റെ സാധ്യതകള്‍ മനസ്സിലാക്കിക്കുക.
*പ്രോഗ്രാമ്മിങ്ങ് സുഖമമാക്കാന്‍ ഉപയോഗിക്കാവുന്ന ടൂളുകള്‍ പരിചയപ്പെടിത്തി അതിന്റെ സാധ്യതകള്‍ ഡെമോണ്‍സ്ട്രേറ്റ് ചെയ്യുക. (ഈമാക്ക്സ്, വിം, zsh മുതലായവ)
*ഡെബിയന്‍ ഡിസ്ട്രിബ്യൂഷനെ കുറിച്ച് ചെറിയ സെഷന്‍.
**പാക്കേജുകള്‍, ഡെബിയന്‍ ആര്‍കൈവ്.
**ഒപ്പം കമ്മ്യൂണിറ്റി പ്രൊജക്റ്റുകള്‍ എങ്ങനെ നടക്കുന്നു.
**ബഗ് റിപ്പോര്‍ട്ടിംഗ്
**മെയിലിങ്ങ് ലിസ്റ്റ് മുതലായ വിഷയങ്ങള്‍
**ഈ കൂട്ടായ്മയുടെ ഭാഗമായി നമുക്കെങ്ങനെ മാറാമെന്നും എങ്ങനെ ഓപണ്‍ സോഴ്സ് പ്രൊജക്റ്റുകളിലേക്ക് കോണ്‍ട്രിബ്യൂട്ട് ചെയ്യാമെന്നും അത് കൊണ്ടുണ്ടാകാവുന്ന നേട്ടങ്ങളും വിവരിക്കും

System Requirements: OS: Debian

== ജിഷ്ണു ==

== ഹൃഷികേശ് ==

== മനുകൃഷ്ണന്‍ ==

== ആദില്‍ ==

== അബുബക്കര്‍ ==

== രാഹുല്‍ ==

== ശ്രീനാഥ് ==

== സജാദ് ==

Talk:സ്വതന്ത്രമലയാളംകമ്പ്യൂട്ടിങ്ങിന്റെ ഒരു വ്യാഴവട്ടം/SMC Camp

2013-08-29T20:23:20Z

Nandaja: /* രൂപരേഖ - നന്ദജ */

SMC camp structure എങ്ങനെ വേണം എന്ന് ഒരു ചെറിയ ചര്‍ച്ച.

ആദ്യകാലത്ത് തര്‍ജ്ജിമയുടെ പേരിലാണ് പല ക്യാമ്പുകളും നടത്തിയിരുന്നത്. ചുമ്മാ ഒരു അനക്കം ഉണ്ടാക്കി, കുറച്ച് പുതുമുഖങ്ങള്‍ വന്നു എന്നാല്ലാതെ ഒരു ഔട്ട്പുട്ട് എന്ന് പറയാന്‍ ക്യാമ്പിലും ഒന്നും ഉണ്ടായിരുന്നില്ല. ഇവിടേയും കൂടുതല്‍ ഒന്നും പ്രതീക്ഷിക്കുന്നില്ല. എന്നാലും നമ്മള്‍ ഇത്തവണ ഇത്തിരിക്കൂടി ട്യൂണ്‍ഡ് ആവണം. ഒരു ചിത്രം ഞാന്‍ പറയാം. എല്ലാരും കൂടിയാല്‍ അത് ഭംഗിയാക്കം.. ഞാനതിന്റെ പശ്ചാത്തലം വിവരിക്കാം.

SMC-വാര്‍ഷികം പൊതുജനങ്ങളെ ഉദ്ദേശിച്ചാണ് നടത്തുന്നത് എങ്കിലും, ഫലത്തില്‍ അത് ടെക്കി/കമ്പ്യൂട്ടര്‍ സാവി എന്നൊക്കെ പറയുന്ന ചെറിയൊരു കൂട്ടമാണ് അതിലെ പ്രധാന stake holders. അത്തരത്തിലുള്ളവര്‍ ഏറ്റവും കൂടുതല്‍ ഉണ്ടാവുന്നത് engineering college-കളിലാണ്. അപ്പോ നമ്മുടെ ആദ്യഘട്ടം engineering കോളേജുകളിലായിരിക്കും.

തിരഞ്ഞെടുത്ത കോളേജുകളില്‍(ചുരുങ്ങിയത് 16) നമ്മള്‍ ഒന്നാം ഘട്ടം നടത്തും. താഴെ പറയുന്നത് ഒരു ചെറിയ ചിത്രം, എല്ലാവരും കൂടി മാന്തി വൃത്തിയാക്കുയോ വൃത്തികേടാക്കുയോ ചെയ്യാം. നന്നായാല്‍ പിതൃത്വം ഏറ്റെടുക്കാന്‍ ആള് കൂടും എന്നൊരു മെച്ചമുണ്ട്. :)

ലിസ്റ്റില്‍ പറഞ്ഞത്രയും കാര്യങ്ങളുടെ ഒരു ചിത്രം കൊടുക്കാന്‍ പറ്റണം. ഇതില്‍ എല്ലാം എല്ലായിടത്തും പറയാനല്ല. ആളെ നോക്കി filter ചെയ്ത് പറയാം. എന്നാലും ഒരു ഉദ്ദേശം കിട്ടാന്‍ ഇതുപോലൊരെണ്ണം ഉപയോഗിക്കാം.

==ഒരു രൂപരേഖ - ബാലു==
കോളേജ് പിള്ളേര്‍ക്ക് എന്തെങ്കിലും സൃഷ്ടിക്കുന്നതിലായിരിക്കും, അല്ലെങ്കില്‍ എന്തെങ്കിലും മോഡിഫിക്കേഷൻ വരുത്തുന്നതിലായിരിക്കും താല്‍പര്യം. അപ്പോള്‍ എന്റെ ഐഡിയ ഇതാണ് :,
*നമ്മള്‍ ഒരു സിമ്പിള്‍ സാധനം കാണിക്കും...
*അത് അവര്‍ക്ക് ഇഷ്ടപ്പെടും... (അങ്ങനത്തെ സാധനമേ കാണിക്കൂ)
*അതെങ്ങനെയാ ഉണ്ടാക്കിയതെന്ന് നമ്മള്‍ സിമ്പിളായിട്ട് പറഞ്ഞ് കൊടുക്കും..
*അപ്പോ അവര്‍ക്ക് തോന്നും "ഈ സംഗതി കൊള്ളാല്ലോ, എന്നാ പിന്നെ നുമ്മക്കും ഒന്ന് കൈവെച്ചൂടേ"
*അതില്‍ ഇത്തിരി മോഡിഫിക്കേഷൻ വരുത്തിയാലോ എന്ന് "ഹൈ, ഇത് ദേ ഞാൻ പറയണ പോലെ മാറ്റിയാല്‍, ഇത്തിരൂടെ നൈസാകില്ലേ... അങ്ങ് മാറ്റി നോക്കാം...".
*അപ്പോ ദേ വരുന്നു എറര്‍... അതങ്ങ് ഫിക്സ് ചെയ്യല്‍ എങ്ങനാന്ന് പറഞ്ഞു കൊടുക്കും... എറര്‍ ഡയഗ്നോസിങ്ങ്, എറര്‍ കറക്ഷൻ, അതിനുള്ള ഹെല്‍പ് എങ്ങനെ കണ്ടുപിടിക്കും എന്നൊക്കെ ഒരു ഓട്ടപ്രദക്ഷിണം....
*എറര്‍ ഒക്കെ ഫിക്സ് ചെയ്ത് അതിന് ഒന്ന് വൃത്തി വരുത്തിയാലോ?? "നോക്കിയേടാ ഗഡി, ഞാൻ ഒരു സാധനം ഉണ്ടാക്കി.. ഇത് കൊള്ളാല്ലോ പരിപാടി.." .
'
<big><big><big>''എന്റെ ഉദ്ദേശ്ശം തീർന്നു.. ആ ഒരു താല്‍പര്യം അവരില്‍ ഉണ്ടാക്കുക.'''</big></big></big>

*ഇനി, അടുത്ത തലം ആണെങ്കില്‍, ഞാൻ ഒരു സോഫ്റ്റ്‌വെയര്‍ കണ്ടു.. അതില്‍ ഒരു ബഗ്ഗ് ഉണ്ടായിരുന്നു.. അതിപ്പോ എങ്ങനാ അത് ഉണ്ടാക്കിയവനെ ഒന്ന് അറിയിക്കുക?? ബഗ്ഗ് റിപ്പോര്‍ട്ടിങ്ങില്‍ ഒരു ഓട്ട പ്രദക്ഷിണം..
*ശരി, ഞാൻ ഉണ്ടാക്കിയ ഒരു സോഫ്റ്റ്‌വെയറില്‍ ഒരുത്തൻ ഒരു ബഗ് റിപ്പോര്‍ട്ട് ചെയ്തു... എന്ത് ചെയ്യും.. വര്‍ക്ക് ചെയ്യുന്നതിനെ ബാധിക്കാതെ എങ്ങനെ ഒന്ന് മാറ്റി നോക്കും?? ബഗ് ഹാൻഡ്ലിങ്ങ്, വേര്‍ഷൻ കണ്ട്രോള്‍ എന്നിവയില്‍ കൂടിയും ഒന്ന് ഓടും.
*അവസാനം, സോഫ്റ്റ്‌വെയര്‍ ഒക്കെ കൊള്ളാം.. പക്ഷേ ഇത് ഇതിലും വൃത്തിയായി, കരക്കാര്‍ക്ക് വായിച്ചാ മനസ്സിലാവണ രീതിയില്‍ അതൊന്ന് ഡോക്യുമെന്റ് ചെയ്യല്‍ എങ്ങനാ.. മനസ്സിലാവണ വേരിയബിള്‍ പേരുകള്‍ എങ്ങനെ എടുക്കാം, എല്ലാത്തിനും ഉപരി, ഏറ്റവും കുറച്ച് റിസോഴ്സ് കൊണ്ട്, ഏറ്റവും കൂടുതല്‍ ഔട്ട്പുട്ട് എങ്ങനെ ഉണ്ടാക്കാം എന്ന് ഒന്ന് പറഞ്ഞ് കൊടുക്കുക.

പറഞ്ഞത് അനൗപചാരികമായ ഭാഷയില്‍ ആയെങ്കില്‍ ക്ഷമിക്കുക. മനസ്സില്‍ തോന്നിയത് പറഞ്ഞു. ത്രേ ഉള്ളു --[[User:Balasankarc|Balasankarc]] ([[User talk:Balasankarc|talk]]) 12:23, 29 August 2013 (PDT)
:കൊല്ലരുതു് --[[User:Manojk|[[User:Manojk|മനോജ്.കെ|Manoj. K]] ([[User_talk:Manojk|Talk]])]] ([[User talk:Manojk|talk]]) 12:49, 29 August 2013 (PDT)

==രൂപരേഖ - നന്ദജ==
എഞ്ചിനീയറിങ്ങ് വിദ്യാര്‍ത്ഥികളെ മുന്നില്‍ കണ്ടുകൊണ്ടുള്ള ഒന്നാം ഘട്ട ക്യാമ്പുകളുടെ രൂപരേഖ ഇങ്ങനെയായിരിക്കണമെന്നാണ് എന്റെ അഭിപ്രായം:

*ലിനക്സിന്റെ സഹായത്തോടെ എങ്ങിനെ നമ്മുടെ വര്‍ക്കുകള്‍ ലഘൂകരിക്കാം.
*വളരെ ഉപയോഗപ്രതമായ യുണിക്ക്സ് കമാന്റുകള്‍ പരിചയപ്പെടുത്തുക.
*കൗതുകമേറിയ ചെറിയ സ്ക്രിപ്റ്റുകള്‍ ഗ്രപ്പും മറ്റുമുപയോഗിച്ച് ഡെമോണ്‍സ്ട്രേറ്റ് ചെയ്ത് കമാന്റ് ലൈനിന്റെ സാധ്യതകള്‍ മനസ്സിലാക്കിക്കുക.
*പ്രോഗ്രാമ്മിങ്ങ് സുഖമമാക്കാന്‍ ഉപയോഗിക്കാവുന്ന ടൂളുകള്‍ പരിചയപ്പെടിത്തി അതിന്റെ സാധ്യതകള്‍ ഡെമോണ്‍സ്ട്രേറ്റ് ചെയ്യുക. (ഈമാക്ക്സ്, വിം, zsh മുതലായവയാണുദ്ദേശിക്കുന്നത്)
*മുകളില്‍ പറയുന്നതെല്ലാം ഡെബിയനില്‍ ഡെമോണ്‍സ്ട്രേറ്റ് ചെയ്യാന്‍ താല്‍പര്യപ്പെടുന്നു.
*ഇവയ്ക്ക് ശേഷം ഡെബിയന്‍ ഡിസ്ട്രിബ്യൂഷനെ കുറിച്ച് ചെറിയ സെഷന്‍.
*പാക്കേജുകള്‍, ഡെബിയന്‍ ആര്‍കൈവ് എല്ലാം കവര്‍ ചെയ്യാന്‍ സാധിക്കുന്ന പോലെ.
*ഒപ്പം കമ്മ്യൂണിറ്റി പ്രൊജക്റ്റുകള്‍ എങ്ങനെ നടക്കുന്നു. ബഗ് റിപ്പോര്‍ട്ടിംഗ്, മെയിലിങ്ങ് ലിസ്റ്റ് മുതലായ വിഷയങ്ങളും ഡെബിയന്റെ ഉദാഹരണമെടുത്ത് പറയാന്‍ സാധിക്കും.
*ഈ കൂട്ടായ്മയുടെ ഭാഗമായി നമുക്കെങ്ങനെ മാറാമെന്നും എങ്ങനെ ഓപണ്‍ സോഴ്സ് പ്രൊജക്റ്റുകളിലേക്ക് കോണ്‍ട്രിബ്യൂട്ട് ചെയ്യാമെന്നും അത് കൊണ്ടുണ്ടാകാവുന്ന നേട്ടങ്ങളും സൂചിപ്പിച്ചാല്‍ കംപ്യൂട്ടര്‍ സയന്‍സ് വിദ്യാര്‍ത്ഥികളുടെ ശ്രദ്ധ ആകര്‍ഷിക്കാന്‍ കഴിഞ്ഞേക്കും.
*ഇതിലൂടെ നമുക്ക് സ്വമക എന്ന ആശയവും മുന്നേട്ട് വെക്കാന്‍ കഴിഞ്ഞേക്കും എന്നാണ് എന്റെ വിശ്വാസം.