GSoC/2013/Project ideas: Difference between revisions

From SMC Wiki
m (Reverted edits by Sperminator (talk) to last revision by Hrishikesh.kb)
 
(63 intermediate revisions by 8 users not shown)
Line 1: Line 1:


=Mentors=
=Mentors=
# Santhosh Thottingal
# Santhosh Thottingal ('''santhosh''' on irc.freenode.net)
# Baiju M
# Baiju M ('''baijum''' on irc.freenode.net)
# Praveen A
# Praveen A ('''j4v4m4n''' on irc.freenode.net)
# Rajeesh K Nambiar
# Rajeesh K Nambiar ('''rajeeshknambiar''' on irc.freenode.net)
# Vasudev Kammath
# Vasudev Kammath ('''copyninja''' on irc.freenode.net)
# Hussain K.H
# Jishnu Mohan ('''jishnu7''' on irc.freenode.net)
# Jishnu Mohan
# Hrishikesh K.B ('''stultus''' on irc.freenode.net)
# Hrishikesh K.B
# Anivar Aravind ('''anivar''' on irc.freenode.net)
# Anivar Aravind
# Anilkumar K V ('''anilkumar''' on irc.freenode.net)
# Sajjad Anwar ('''geohacker''' on irc.freenode.net)
# Deepa V Gopinath ('''deepagopinath''' on irc.freenode.net)
# jain Basil  ('''jainbasil''' on irc.freenode.net)


=Ideas for Google Summer of Code 2013=
=Ideas for Google Summer of Code 2013=
* Please Read the [http://wiki.smc.org.in/SoC/2013#FAQ FAQ]


<big>'''Apart from the following ideas , you can propose your own ideas'''</big>


== Internationalize SILPA project with Wikimedia jquery projects==
If you want to propose an idea, please do it in [http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in project mailing list]>
'''Project''':
 
SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the jquery.ime and jquery.i18n libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using jquery.webfonts library.
 
* '''[https://savannah.nongnu.org/task/index.php?12557 Savannah Task]'''
* '''Expertise required''': jquery, css, html5, python
* '''Mentor''' : Hrishikesh
====More Details====
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]


== A spell checker for Indic language that understands inflections ==
== A spell checker for Indic language that understands inflections ==
Line 31: Line 25:


SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.
SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.
Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi.
Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.


* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''
* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''
* '''Expertise required''': Basic understanding of grammar system of at least one Indian language
* '''Expertise required''': Average level understanding of grammar system of at least one Indian language
 
* Complexity: Advanced
* '''Mentor''' : Santhosh Thottingal
* '''Mentor''' : Santhosh Thottingal


Line 40: Line 37:
'''Project''':
'''Project''':


ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII is supposed to have Indic language rendering support using XeTeX, but in practice we have found it lacking. MKII is deprecated anyway, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.
ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.


* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]
* '''[https://savannah.nongnu.org/task/index.php?12559 Savannah Task]
Line 47: Line 44:
* '''Mentor''' : Rajeesh K Nambiar
* '''Mentor''' : Rajeesh K Nambiar


==Automated Rendering Testing==
* '''More Details''': ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
'''Project''':
\usemodule[simplefonts]
 
\definefontfeature[malayalam][script=mlym]
Automated Rendering Testing system for Indic languages. Currently there exists 3 main rendering engines in computing world - Uniscribe of Microsoft, CoreText (Apple Advance Typography - AAT) of Apple and Harfbuzz for *nix systems. The Opentype font specification is maintained by Microsoft and implemented in Uniscribe, which is used as baseline for Harfbuzz. At present, there is no automated mechanism to determine if Harfbuzz is rendering complex Indic text correctly or not - someone expert in relevant language has to manually inspect the output from hb-view. The project aim is to identify and implement an automated method to test the rendering.
\setmainfont[Rachana][features=malayalam]
One method to do this might be to check the order of glyphs/glyph indices output by the rendering engine - this depends on the font too. A related topic is UTRRS https://fedorahosted.org/utrrs/, http://tdil-dc.in/utrrs/home/about
\starttext
 
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
'''Expertise required''':
\stoptext
Knowledge of Indic language rendering and Opentype specification.
Generate the output using command
 
texexec --xetex <file.tex>
'''Mentor''' : Rajeesh K Nambiar
 
== Create Bold and Italic variants for Meera and Rachana ==
'''Project''' :The Meera font has only regular version now. Synthetic bold and italic is not perfect and suitable for Malayalam. Create a bold and italic variant version for the Meera and Rachana fonts
 


'''Expertise required''':
==SILPA BASED==
Digital typography, good understanding of Malayalam writing system, fontforge, understanding of rendering engines like Harfbuzz.
 
'''Possible Mentor''' : Hussain K H
 
* '''[https://savannah.nongnu.org/task/index.php?12553 Savannah Task]'''
 
==Flask Based SILPA==
===Port remaining modules to the new flask based Silpa===
===Port remaining modules to the new flask based Silpa===
'''Project''': Silpa is being re-written using flask framework. Core part is almost complete. But most of the sub modules written under old framework are need to be ported to [http://flasksilpa-indic.rhcloud.com/ new framework].
'''Project''': Silpa is being re-written using flask framework. Core part is almost complete. But most of the sub modules written under old framework are need to be ported to [http://flasksilpa-indic.rhcloud.com/ new framework].
Line 75: Line 60:
'''Expertise required''': Python , Flask , Jinja , HTML, Javascript
'''Expertise required''': Python , Flask , Jinja , HTML, Javascript


'''Mentor''' : Rajeeesh Nambiar/ Jishnu
'''Mentor''' : Vasudev/ Jishnu


===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===
===Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC===
Line 93: Line 78:
'''Mentor''' : Vasudev/Jishnu
'''Mentor''' : Vasudev/Jishnu


===add Urdu/Arabic support for all modules in SILPA.===


== Converting indic processing modules currently in SILPA into Jquery library  ==
'''Expertise required''':python,Flask, HTML,Javascript,CSS,jinja
'''Project''': Port some of the silpa algorithms to Jquery.
 
'''Expertise required''': javascript,JQuery, python


'''Mentor''' :Vasudev/Jishnu
'''Mentor''' : Vasudev


==  Improving cross language transliteration system.  ==
===  Improving cross language transliteration system.  ===
'''Project''':
'''Project''':


Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration
Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration
CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.


'''Expertise required''':python
'''Expertise required''':python
Line 110: Line 95:
'''Mentor''' : Vasudev/Jishnu
'''Mentor''' : Vasudev/Jishnu


=== Internationalize SILPA project with Wikimedia jquery projects ,  Improve  the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it ===
'''Project''':


'''Internationalize SILPA''' :-  SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the [//github.com/wikimedia/jquery.ime jquery.ime] and [//github.com/wikimedia/jquery.ime jquery.i18n] libraries from Wikimedia. A sample implementation is avaliable in our [http://smc.org.in website]. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using [https://github.com/wikimedia/jquery.webfonts jquery.webfonts] library.


== Improving the webfonts module in Silpa using jquery.webfonts and proving more Indic and complex fonts as part of it. ==
'''Improve the webfonts ''' :-
'''Project''':
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Currently Silpa provides 36 webfonts. add more fonts to this collection.
* Rewrote webfonts module to use the features of jquery.webfonts
* Rewrote webfonts module to use the features of jquery.webfonts
Line 121: Line 108:
* Provide font preview and download options   
* Provide font preview and download options   


'''Expertise required''':jQuery, Python , technical understanding about fonts
====More Details====
* [https://github.com/wikimedia/jquery.i18n jquery.i18n]
* [https://github.com/wikimedia/jquery.ime jquery.ime]
* [https://github.com/wikimedia/jquery.webfonts jquery.webfonts]


'''Mentor''' : Vasudev/Jishnu
'''Expertise required''': jQuery, css, html5, Python , flask , technical understanding about fonts
 
'''Mentor''' : Hrishikesh K B / Sajjad Anwar
 
==Building a system and API's for accessing and upadating Malayalamgrandham Bibligiography Data==
[http://www.malayalagrandham.com/about/ Malayala Grantha Vivaram]  is a project intended to make available reliable bibliographic information on all Malayalam books published in Kerala and elsewhere. This Open data set contains Complete bibliography data from first Impression to 1995. This project wants to add following features to Malayalagrandham DB and build it as a bibliography web service
* Facility for adding/linking copyright expired books to malaylagrandha vivaram
* Adding ISBN & ISBN based seller discovery
* Building  Interface for Publishers through with they can contribute their publication bibliography .
* Similar module for Libraries . That will be added to found in library section of each book
* A module for building qr code of bibliography with a malayalagrandham link
* Crowd sourced way for input and an approval queue interface for submissions.
* MARC21 and MARCXML support
* A dynamic visualisation interface for book information browsing
* proper API , and app work flow documentation
 
'''Expertise required''': Django / Ruby on Rails
 
'''Mentor''': Anivar
 
'''Related Links'''
* https://github.com/smc/malayalagrandham
* http://malayalagrandham.com/about


==Adding Indic Sript Rendering Support to GIS applications ==
== Adding Braille Keyboard layouts for Indian Languages to m17n Library==
===Add proper Indic / Malayalam rendering to Mapnik===
Mapnik is a free mapping toolkit, written in C++. One of it's major users is OpenStreetMap. If you check OpenStreetMap, you can see that Languages like Russian, Arabic, Persian, Chinese etc are rendered in it (Not sure whether they are properly rendered or not). The lack of proper Indic support is the major reason for the absence of Malayalam.


* '''[https://savannah.nongnu.org/task/index.php?12554 Savannah Task]'''
Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes.  Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS


====More Details====
====More Details====
* http://mapnik.org/
* http://www.acharya.gen.in:8080/disabilities/bh_brl.php
* http://www.openstreetmap.org/
* http://en.wikipedia.org/wiki/Bharati_Braille
* http://www.nongnu.org/m17n/
 
'''Mentor''': Anilkumar K V


===Add Indic / Malayalam rendering to MapServer + OpenLayers stack.===
==Developing Malayalam Calendar Support ==
Both are OSGeo projects, and used in most of the WebGIS applications recently. MapServer is an open source development environment for building spatially enabled internet applications. OpenLayers is an open source JavaScript library for displaying map data in web browsers. OpenLayers is used by OpenStreetMap for its "slippy map" map interface.
 
* '''[https://savannah.nongnu.org/task/index.php?12555 Savannah Task]'''
This project is to build Malayalam Calendar support to KDE. Korganiser have support for many calendars like Saka, Hijri, Jalali. The project is to build support for Malayalam Calendar (Kolla Varsham)
 
Following tasks are part of this work .
* Building Malayalam (kollavarsham) Calendar support to KDE/KOrganiser.
* Add holidays and other special times.


====More Details====
====More Details====
* http://www.mapserver.org/
* Understanding of Astronomical Algorithms
* http://www.openlayers.org/
* http://malayalam.usvishakh.net/calendars/panj_theory.pdf
* http://www.osgeo.org/
* https://github.com/smc/smc/tree/master/calendar/java/src/org/panchanga/indic
 
==== Starting Points ====
 
* [https://github.com/smc/smc/tree/master/calendar/kde Build KDE from source]
* Familiarize with how Saka and other calendars are done.
 
'''Mentor''': Praveen A, Mahesh M
 
==Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx==
 
CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language.  Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.
 
'''Mentor''':  Deepa P. Gopinath
=== Background Reading ===
 
 
* [http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf 'Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems'], Gopalakrishna  Anumanchipalli, Rahul Chitturi, Sachin Joshi, Rohit Kumar, Satinder Pal Singh, R.N.V. Sitaram, S P Kishore
 
* [http://www.aclweb.org/anthology/W/W12/W12-5808.pdf "Automatic Pronunciation Evaluation And Mispronunciation Detection Using CMUSphinx"], Ronanki Srikanth, James Salsman
 
* http://www.speech.cs.cmu.edu/
* http://cmusphinx.sourceforge.net/wiki/tutorial
 
* [http://www.ijarcsse.com "HTK Based Telugu Speech Recognition"], P. Vijai Bhaskar, AVNIET ,Hyderabad, Prof. Dr. S. Rama Mohan Rao, A.Gopi
 
* [http://www.cs.cmu.edu/~araza/Automatic_Speech_Recognition_System_for_Urdu.PDF "Design and  Development of an Automatic Speech Recognition System for Urdu"], Agha Ali Raza,  M.Sc. Thesis, FAST‐National University of Computer and Emerging Sciences
 
* [http://www.ccis2k.org/iajit/PDF/vol.6,no.2/11IASRUCSS186.pdf "Investigation Arabic Speech Recognition Using CMU Sphinx System"], Hassan Satori1, 2, Hussein Hiyassat3, Mostafa Harti1, 2, and Noureddine Chenfour


===Add proper Indic / Malayalam support and rendering to GRASS GIS.===
* [http://www.try.idv.tw/static-resources/homework/pr/PR_Final_Report.pdf "Understanding the CMU Sphinx Speech Recognition System"], Chun-Feng Liao
It is used by a number of organizations for analysing GIS data, creating maps etc. GRASS also is an OSGeo project. It is in the process of rewriting the old Tcl/Tk interface in the new wx-python.


* '''[https://savannah.nongnu.org/task/index.php?12556 Savannah Task]'''
==Enhancement of Indic Language Support in Scribus==


====More Details====
===Project Description===
* http://grass.osgeo.org/
 
Scribus is an Open Source program that brings professional page layout to Linux/UNIX. Find more about scribus [http://www.scribus.net here]. Indic Languages are not supported completely in Scribus as of now. As a result of the work done by a group of developers from India, there is a branch available in Scribus public git repo which supports Indic Languages. The aim of this idea is to Enhance the Indic language support solving the following issues:
 
* Harfbuzz is being used as a thrid party code in Scribus. Maintaining thirdparty code is a tedious task, this should be added as a dependency instead.
* Enhance hyphenation by adding hyphenation rules.
 
'''Expertise Required:''' Good understanding of C, C++, Qt, Harfbuz etc, cmake.


'''Mentor''': Praveen A
===More information===


==Building a system and API's for accessing and upadating Malayalamgrandham Bibligiography Data==
* Branch indic in scribus public git repository. Clone url: git clone git://git.scribus.net/scribus.git
[http://www.malayalagrandham.com/about/ Malayala Grantha Vivaram]  is a project intended to make available reliable bibliographic information on all Malayalam books published in Kerala and elsewhere. This Open data set contains Complete bibliography data from first Impression to 1995. This project wants to add following features to Malayalagrandham DB and build it as a bibliography web service
* [[Hyphenation]]
* Facility for adding copyright expired books to malaylagrandha vivaram
* [http://wiki.scribus.net/canvas/Git#To_sum_it_up_:_how_to_get_and_compile_a_specific_branch_:_indic_of_scribus.git Compile scribus indic branch]
* Adding ISBN
* Building  Interface for Publishers through with they can contribute  their publication bibliography .
* Similar module for Libraries . That will be added to found in library section of each book
* A module for building qr code of bibliography with a malayalagrandham link . It can be used by publishers and libraries
* Crowd sourced way for input and an evaluation interface for submissions.
* MARC21 and MARCXML support


'''Expertise required''':
'''Mentor'''     : Jain Basil Aliyas (jainbasil in IRC)


'''Mentor''': Baiju M /Anivar
'''Complexity''' : Advanced

Latest revision as of 05:20, 26 January 2017

Mentors

  1. Santhosh Thottingal (santhosh on irc.freenode.net)
  2. Baiju M (baijum on irc.freenode.net)
  3. Praveen A (j4v4m4n on irc.freenode.net)
  4. Rajeesh K Nambiar (rajeeshknambiar on irc.freenode.net)
  5. Vasudev Kammath (copyninja on irc.freenode.net)
  6. Jishnu Mohan (jishnu7 on irc.freenode.net)
  7. Hrishikesh K.B (stultus on irc.freenode.net)
  8. Anivar Aravind (anivar on irc.freenode.net)
  9. Anilkumar K V (anilkumar on irc.freenode.net)
  10. Sajjad Anwar (geohacker on irc.freenode.net)
  11. Deepa V Gopinath (deepagopinath on irc.freenode.net)
  12. jain Basil (jainbasil on irc.freenode.net)

Ideas for Google Summer of Code 2013

  • Please Read the FAQ

Apart from the following ideas , you can propose your own ideas

If you want to propose an idea, please do it in project mailing list>

A spell checker for Indic language that understands inflections

Project:

SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.

Recently Tamil attempted developing a spellchecker using Hunspell with multi level suffix stripping. You can see the result here https://github.com/thamizha/solthiruthi. Our attempt should be first to use Hunspell to achieve spellchecking with agglutination and inflection. Probably it will require lot of scripting to generate suffix patterns, we can ask help from existing language communities too. If Hunspell has limitation with multi level suffxes- sometimes Indian languages require more than 5 levels of suffix stripping, we need to document it(bug and documentation) and try to attempt python based solution on top of SILPA framework.

  • Savannah Task
  • Expertise required: Average level understanding of grammar system of at least one Indian language
  • Complexity: Advanced
  • Mentor : Santhosh Thottingal

Indic rendering support in ConTeXt

Project:

ConTeXt is another TeX macro system similar to LaTeX but much more suitable for design. To find more information about ConTeXt, see the wiki http://wiki.contextgarden.net/Main_Page. ConTeXt MKII have Indic language rendering support using XeTeX. but MKII is deprecated, and the new MKIV backend doesn't support Indic rendering yet. The aim of this project is to add support to Inidic rendering to ConTeXt MKIV. XeTeX is using Harfbuzz to do correct Indic rendering.

  • Savannah Task
  • Expertise required: Understanding of the TeX system, experience in either LaTeX or ConTeXt and basic understanding of Indic language rendering. MKIV uses Lua, familiarity with Lua, opentype specifications or Harfbuzz will be added advantage.
  • Mentor : Rajeesh K Nambiar
  • More Details: ConTeXt mkii (deprecated) can work with XeTeX backend for Indic rendering. Here is a sample file:
\usemodule[simplefonts]
\definefontfeature[malayalam][script=mlym]
\setmainfont[Rachana][features=malayalam]
\starttext
മലയാളം \TeX ഉപയോഗിച്ച് ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext

Generate the output using command

texexec --xetex <file.tex>

SILPA BASED

Port remaining modules to the new flask based Silpa

Project: Silpa is being re-written using flask framework. Core part is almost complete. But most of the sub modules written under old framework are need to be ported to new framework.

Expertise required: Python , Flask , Jinja , HTML, Javascript

Mentor : Vasudev/ Jishnu

Provide REST API for new flask based Silpa, including conversion of templates to this REST API from JSON RPC

Project: Silpa is now relying on JSONRPC. We need to, either completely move to REST API or provide REST API as an additional feature.

Expertise required: Python , Flask , Jinja , HTML, Javascript

Mentor : Vasudev/Jishnu

Separate templates from SILPA and have it inside modules packaged for pypi

Project: The templates used as User Interface is part of SILPA. this should be separated and should come as a part of individual modules.

this should give more idea on it

Expertise required:python,Flask, HTML,Javascript,CSS,jinja

Mentor : Vasudev/Jishnu

add Urdu/Arabic support for all modules in SILPA.

Expertise required:python,Flask, HTML,Javascript,CSS,jinja

Mentor : Vasudev

Improving cross language transliteration system.

Project:

Currently only Kannada and Malayalam are perfect rest all are first converted to Malayalam then to English due to lack of language internal. Also currently for English to Indic we use CMUDict so transliteration capability is limited to words in CMUDict only probably we could develop better method for English to Indic transliteration

CLDR has transliteration data for Indic languages. We can explore it and see the feasibility. For an intermediate representation of the scripts either IPA can be used or ISO 15919 standard can be used. All these must be supplemented with exception rules and special case handling to achieve more perfect result.

Expertise required:python

Mentor : Vasudev/Jishnu

Internationalize SILPA project with Wikimedia jquery projects , Improve the webfonts module in Silpa using jquery.webfonts and provide more Indic and complex fonts as part of it

Project:

Internationalize SILPA :- SILPA project has many Indic language applications, but as of now, if somebody want to input in Indian languages, there is no built in tool in it. Similarly, the application is not internationalized. Both of these can be achieved by using the jquery.ime and jquery.i18n libraries from Wikimedia. A sample implementation is avaliable in our website. The i18n should be in the SILPA flask framework with a nice templating system. Similarly the interface should have webfonts using jquery.webfonts library.

Improve the webfonts  :-

  • Currently Silpa provides 36 webfonts. add more fonts to this collection.
  • Rewrote webfonts module to use the features of jquery.webfonts
  • reate a repo as per jquery.webfonts specification
  • Provide a clean api so that other websites can use our webfonts in their websites
  • Document the usage
  • Provide font preview and download options

More Details

Expertise required: jQuery, css, html5, Python , flask , technical understanding about fonts

Mentor : Hrishikesh K B / Sajjad Anwar

Building a system and API's for accessing and upadating Malayalamgrandham Bibligiography Data

Malayala Grantha Vivaram is a project intended to make available reliable bibliographic information on all Malayalam books published in Kerala and elsewhere. This Open data set contains Complete bibliography data from first Impression to 1995. This project wants to add following features to Malayalagrandham DB and build it as a bibliography web service

  • Facility for adding/linking copyright expired books to malaylagrandha vivaram
  • Adding ISBN & ISBN based seller discovery
  • Building Interface for Publishers through with they can contribute their publication bibliography .
  • Similar module for Libraries . That will be added to found in library section of each book
  • A module for building qr code of bibliography with a malayalagrandham link
  • Crowd sourced way for input and an approval queue interface for submissions.
  • MARC21 and MARCXML support
  • A dynamic visualisation interface for book information browsing
  • proper API , and app work flow documentation

Expertise required: Django / Ruby on Rails

Mentor: Anivar

Related Links

Adding Braille Keyboard layouts for Indian Languages to m17n Library

Project is building support for Bharati Braille keyboard layouts in GNU/Linux systemes. Bharati Braille standard is the official Braille standard in India. A regular QWERTY keyboard is used for data entry. SDF-JKL keys are used for six dots of Braille. This support need to be built as m17n layouts. This will enable visually challenged people who studied braille layouts to use GNU/Linux systems easily with the help of Audio feedback from TTS

More Details

Mentor: Anilkumar K V

Developing Malayalam Calendar Support

This project is to build Malayalam Calendar support to KDE. Korganiser have support for many calendars like Saka, Hijri, Jalali. The project is to build support for Malayalam Calendar (Kolla Varsham)

Following tasks are part of this work .

  • Building Malayalam (kollavarsham) Calendar support to KDE/KOrganiser.
  • Add holidays and other special times.

More Details

Starting Points

Mentor: Praveen A, Mahesh M

Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx

CMU Sphinx is a large vocabulary, speaker independent speech recognition codebase and suite of tools, which can be used to develop speech recognition system in any language. To develop an automatic speech recognition system in a language, acoustic model and language model has to framed for that particular language. Acoustic models characterize how sound changes over time. It captures the characteristics of basic recognition units. The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. It attempts to convey behavior of the language and tries to predict the occurrence of specific word sequences possible in the language. Once these two models are developed, it will be useful to every one doing research in speech processing. For Indian languages Hindi, Tamil, Telugu and Marati, ASR systems have been developed using sphinx engine. In this project work is aimed at developing acoustic model and language model for Malayalam.

Mentor: Deepa P. Gopinath

Background Reading

Enhancement of Indic Language Support in Scribus

Project Description

Scribus is an Open Source program that brings professional page layout to Linux/UNIX. Find more about scribus here. Indic Languages are not supported completely in Scribus as of now. As a result of the work done by a group of developers from India, there is a branch available in Scribus public git repo which supports Indic Languages. The aim of this idea is to Enhance the Indic language support solving the following issues:

  • Harfbuzz is being used as a thrid party code in Scribus. Maintaining thirdparty code is a tedious task, this should be added as a dependency instead.
  • Enhance hyphenation by adding hyphenation rules.

Expertise Required: Good understanding of C, C++, Qt, Harfbuz etc, cmake.

More information

Mentor  : Jain Basil Aliyas (jainbasil in IRC)

Complexity  : Advanced