GSoC/2009: Difference between revisions
Line 64: | Line 64: | ||
'''Mentor''' : Praveen A/Santhosh Thottingal | '''Mentor''' : Praveen A/Santhosh Thottingal | ||
===Functional Optical character Recognition system=== | |||
'''Brief Desciption:''' | |||
Malayalam(or Any Indian Language) does not have a working Optical Character Recognition system. There was lots of research in this field by many, but none of them was successfull. Tesseract OCR seems promising and there are works going on in Bengali. Based on that works we need to add Malayalam support to tesseract ocr | |||
'''Expectation:''' | |||
* Study tesseract OCR system | |||
* Recognition of all characters | |||
* Add support to Malayalam and optimize the accuracy | |||
More details : http://code.google.com/p/tesseract-ocr/ and http://code.google.com/p/ocropus/ | |||
Mentor: TBD | |||
===New Family of Equal Height Fonts (EHF)for Malayalam Language=== | |||
'''Brief Description:''' | |||
To design and create a new family of Equal Height Fonts for the traditional Malayalam script. Following Roman typology, serif and sans serif type of font variations are available in Malayalam. Equal Width Fonts, such as Courier, available in Roman typography are impossible for Malayalam characters and this is unnecessary. The proposed Equal Height Fonts is a new concept in the history of font making to surmount the typographical challenge of vertically stacked conjuncts | |||
'''Knowledge Prerequisite''' | |||
Understanding of opentype/truetype font design technologies and experience with tools like fontforge | |||
'''Mentor:''' Hussain K H | |||
===Batch converter for documents(doc/odt) with ASCII Font encoded data to Unicode Documents=== | |||
'''Brief Description:''' | |||
There are lots of documents exists in India with content encoded in non-standard ASCII fonts. The project aim is enhance our existing ASCII to Unicode converter [[Payyans]] such a way that it can read doc and odt documents and do the conversion using the existing APIs | |||
'''Expectation:''' | |||
* Payyans should be able to convert .doc documents to Unicode encoded ODT documents | |||
* Batch conversion as well as single copy conversion should be possible | |||
* APIs should be provided for developers | |||
* Should support almost all ASCII fonts. Supporting the maps present in Padma converter is recommended. | |||
'''Knowledge Prerequisite''' | |||
Students should know Python. | |||
'''Mentor:''' Rajeesh Nambiar/Nishan Naseer |
Revision as of 04:34, 21 February 2009
Ideas for Google Summer of Code 2009
Guidelines
Information for Students
These ideas were contributed by our developers and users. They are sometimes vague or incomplete. If you wish to submit a proposal based on these ideas, you may wish to contact the developers and find out more about the particular suggestion you're looking at.
Being accepted as a Google Summer of Code student is quite competitive. Accepted students typically have thoroughly researched the technologies of their proposed project and have been in frequent contact with potential mentors. Simply copying and pasting an idea here will not work. On the other hand, creating a completely new idea without first consulting potential mentors is unlikely to work out.
If there is no specific contact given you can ask questions on SMC discussion mailing-list ( smc-discuss@googlegroups.com ) or in the IRC channel ( #smc-project channel in Freenode.net server )
Even though SMC is a Malayalam computing developer community, there is no restriction for students from other languages or other parts of the world. Unless otherwise mentioned Malayalam knowledge is not required for the projects given below
Adding a Proposal
When adding an idea to this section, please try to include the following data:
- if the application is not widely known, a description of what it does and where its code lives
- a brief explanation
- the expected results
- pre-requisites for working on your project
- if applicable, links to more information or discussions
- your name and email address for contact (if you're willing to be a mentor)
If you are not a developer but have a good idea for a proposal, get in contact with relevant developers first.
Ideas
Jabber buddy bot with Dict support and KDE Plasmoid with configurable dict server
Brief Description: Project requirement is to develop a Jabber protocol based bot which can connect to a dict server(configurable) and people can add that bot to their Gmail/Jabber contacts to check the meaning of words. Second part of the project is to make the existing KDE dict plasmoid configurable so that users can connect to any dict server(including the one running on localhost). Current KDE dictionaly plasmoid is not configurable.
Expectation
- The jabber bot should be written using python-dict APIs and python-jabber APIs. The backend server details should be configurable. Auto authentication of buddy requests should be handled.
- Bot should log the words for which entry is not found in dict server
- KDE desktop dictionary plasmoid should be able to connect any server/port configured. The patches should be submitted to upstream
Knowledge Prerequisite: Knowledge of Python for writing the bot & Plasmoid. A basic understanding of how DICT works is recommended
Mentor : Rajeesh/Ashik
Port Dhvani to Other Operating systems and write Speech dispatcher Driver
Brief Description: Dhvani is a text to speech system designed for Indian Languages. The aim of this project is to ensure that literacy and knowledge of English are not essential for using a Computer. Project requirement is to port Dhvani to Windows/Mac . Second part is to write a driver for speech dispatcher. Dhvani has generic driver for speech dispatcher and that is not efficient and cannot use many features of dhvani. Write a driver for speech-dispatcher and integrate it with Orca so that it works as Screenreader
Expectation
- Dhvani should be able to run as standalone binary in other Operating systems. APIs also need to tested
- Dhvani should be able to work as screenreader with speech-dispatcher
Knowledge Prerequisite: Good knowledge of C. Familiarity with SDL would be nice, but not required.
Material Prerequisite: A computer with Windows/Mac OS development setup in addition to GNU/Linux
Mentor: Santhosh Thottingal
Indic Calendrical Calculation Library
Brief Description: Create a calendrical calculation library that will be usable from Python/C++ programs. Writing a C++/C library and writing a Python wrapper for it is enough. This can be used to display Indian calendars on Free Desktops Expectation
- Design a Calendar Library for Indian Languages.
- Atleast one calendar should be functional at the end of Project. Student can choose the calendar.(for eg: Kollavarsha or Bengali Calendar or Tamil Calendar)
- Integrate the Calendar to KDE Calendar system
Knowledge prerequisite : Knowledge of C/C++, Python. and Maths!
Mentor : Praveen A/Santhosh Thottingal
Functional Optical character Recognition system
Brief Desciption: Malayalam(or Any Indian Language) does not have a working Optical Character Recognition system. There was lots of research in this field by many, but none of them was successfull. Tesseract OCR seems promising and there are works going on in Bengali. Based on that works we need to add Malayalam support to tesseract ocr Expectation:
- Study tesseract OCR system
- Recognition of all characters
- Add support to Malayalam and optimize the accuracy
More details : http://code.google.com/p/tesseract-ocr/ and http://code.google.com/p/ocropus/ Mentor: TBD
New Family of Equal Height Fonts (EHF)for Malayalam Language
Brief Description: To design and create a new family of Equal Height Fonts for the traditional Malayalam script. Following Roman typology, serif and sans serif type of font variations are available in Malayalam. Equal Width Fonts, such as Courier, available in Roman typography are impossible for Malayalam characters and this is unnecessary. The proposed Equal Height Fonts is a new concept in the history of font making to surmount the typographical challenge of vertically stacked conjuncts Knowledge Prerequisite Understanding of opentype/truetype font design technologies and experience with tools like fontforge Mentor: Hussain K H
Batch converter for documents(doc/odt) with ASCII Font encoded data to Unicode Documents
Brief Description: There are lots of documents exists in India with content encoded in non-standard ASCII fonts. The project aim is enhance our existing ASCII to Unicode converter Payyans such a way that it can read doc and odt documents and do the conversion using the existing APIs Expectation:
- Payyans should be able to convert .doc documents to Unicode encoded ODT documents
- Batch conversion as well as single copy conversion should be possible
- APIs should be provided for developers
- Should support almost all ASCII fonts. Supporting the maps present in Padma converter is recommended.
Knowledge Prerequisite Students should know Python. Mentor: Rajeesh Nambiar/Nishan Naseer