User:Yash/Application

From SMC Wiki
Revision as of 11:56, 2 May 2013 by Yash (talk | contribs) (Created page with " == '''Improving cross language transliteration system''' == == '''Who are you?''' == - '''Name''' : Yash Sinha - '''University''': Birla Institute of Technology & Scienc...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Improving cross language transliteration system

Who are you?

- Name : Yash Sinha

- University: Birla Institute of Technology & Science, Pilani, Rajasthan.

- College: Birla Institute of Technology & Science, Pilani Campus, Pilani.

- Current Education: M.Sc. (Tech.) Information Systems. (2012-2016)

- Background:

I am Yash pursuing Information Systems at BITS Pilani. I am from Hazaribag, a small town in Jharkhand, India. I am a geek who does what he wants.

I started to learn Java during my school days, and built a really cool application Dhamaal Calculator, which apart from computing numbers, had features like Splash screen, Hindi/Sanskrit language support, System tray icon etc. I went on to learn C and C++ to increase the speed of my programs for coding challenges. I did qualify for the final round of ACM-Indian Coding League held at BITS, Pilani as a part of APOGEE, technical fest of BITS. After that, I also learnt Python, matplotlib(plotting library) and wxPython(UI library). I also got a certificate from Massachusetts Institute of Technology for the same at edx.org. Last month, I learnt about OpenGL on edx.org.

Besides all these, I also like chemistry, interested in surface phenomenon. I made a project on how to adsorb carbon from ambient atmosphere by capturing it at the surface of a resin, which was selected for CBSE National Level Science Exhibition at New Delhi.

I also carry a legacy of Indian Classical Music from my family. I am a tabla player and I have performed at National Level Youth Festivals more than five times.

- Contact:

- GitHub Username: yash-sinha
- Email: mail.yash.sinha@gmail.com
- IRC Handler: #sinhayash
- Blog: sinhayash.wordpress.com 


What is your programming experience?

1. What platform do you use to code? What editor do you prefer and why?

I use both Windows and Linux(Ubuntu 12) operating systems. Java and Python are my favourite languages, but I am also good at C, C++. For C and C++, I use CodeBlocks and Visual Studio, whereas I prefer Ninja/IDLE for Python and use NetBeans for Java.

2. How good can you use Malayalam and how good is your Malayalam reading and typing skills? I have no experience in Malayalam. I do have friends who are fluent in Malayalam, who can help me understand the script, if need arises.

3. Tell us about something you have created.

i. In class 11th, I created an application called “Dhamaal Calculator” in Java. It had all features of a scientific calculator. It also had features like background changer, look and feel selector, support for Hindi & Sanskrit languages, splash screen and system tray icon.

ii. I made a Hangman game in Python in which a player thinks of a word and the other tries to guess it by suggesting letters or numbers.

iii. I also made a Word Scrabble game in Python in which the player had to form meaningful words from a given pool of letters and he scored points based on that. It also had the option to play with computer.

iv. I made a simulation program in Python, which stochastically determined virus population in a patient’s body and plotted graphs using the data obtained.

4. What makes you excited about SMC? Have you worked before with SMC or another open source project as a contributor? If yes, when and on what?

I have not worked with any another open source project before. I didn't contribute to SMC formally, however did some minor contributions and solved the issue raised for GSoC beginners. Right from my schooldays, I have been enthusiastic about non-English language support in computer applications. This led me to add Hindi and Sanskrit support in Dhamaal Calculator. SMC’s goal of upbringing Indic languages and finally, come up with a language module for Python community appeals me. This is the main reason, why I would like to work with SMC. I would like to learn and contribute to SMC, even if I do not get selected for GSOC.

5. Have you ever used git or any other version control system?

I have not used git previously. I used it while setting up silpa repository and learning it quickly.


Others

1. Did you participate with the past GSoC programs, if so which years, which organizations?

No, I have not been part of GSoC before.

2. Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment.

I have no other obligations between May and August. I plan to work 6~7 hours every week and take Sunday off. (7*6 = 42 hours)

3. Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2013 program, if yes, which area(s), you are interested in?

Yes, I would like to contribute to improve the transliteration system.

4. Have you communicated with a potential mentor? If so, who?

I have communicated with Vasudev Kamath (copyninja on IRC).

5. SMC Wiki link of your proposal

wiki.smc.org.in/User:Yash/Application

Why should we choose you over other applicants?

I have been very enthusiastic right from the beginning on IRC regarding silpa project. I started discussing and learning about it from the very first hour SMC was declared as mentoring organization by Google. I have a fair knowledge of silpa source code and I love to code in Python. For me, it would be a lifetime achievement to work on such a project and ultimately benefit the Python community. I have already cloned the repository (I had some hiccups initially) and even tried to add Hindi dictionaries and normalizations, the details of which I have posted on Wiki. What is your project? I want to improve the cross language transliteration system by adding support of Hindi, as an intermediate language. I would also try to use the tools of CLDR project to improve the transliteration system. Implementation plan Similar to Malayalam and Kannada I will implement the transliterate functions in Hindi. I will include features of Hindi script like Chandrabindu, Chandra, Anusvara, Visarga, Rra, Llla, Udatta, Anudatta, Danda and Za. I would also add normalizations to include sounds like au (as in English word: awe), da (as pahaad in Hindi) and Om. I will also figure out how to use CLDR tools to improve the transliteration system. I will try to incorporate Levenshtein’s Edit Distance algorithm and carry out transliteratioin in a better way.


Proposal Timeline:

Before May 3 (Application deadline):

o Setup required environment at my workplace. ✓ o Join mailing list ✓ o Setup my blog and WikiPage. ✓ o www.sinhayash.wordpress.com o wiki.smc.org.in/User:Yash o Know the community and its working style. o Familiarize myself with the code of the project, the documentation and test system used.

May 3 – May 27:

o Know the git version control system. o I will try to add Hindi dicts and Hindi transliteration functions in the module to improve my further understanding of the code.

May 28 – June 17 (Before the official coding time):

o Get familiar with python modules flask, Jinja2, Werkzeug and Virtualenv. o I will utilize this time to discuss and finalize the changes (if any) on existing set of deliverables. o During this period, through IRC and Mailing lists, I will be in touch with my mentor and the community and become absolutely clear about the desired final results.

June 18 – July 2 (Official coding period starts):

o Add Hindi as an intermediate language for transliteration. o Add dictionaries for vowels, consonants and vowel symbols. o Add normalizations. o Finalize on how to use the CLDR tools to improve transliteration system.

July 3 - July 31:

o Prepare for midterm evaluation

Aug 2nd MID TERM EVALUATION

Aug 2 – Sep 2:

o Add CLDR tools to transliteration system. o Write a detailed test suite for testing. o Final review.

Sep 16:

o For Documentation o I have kept a buffer of two weeks for any unpredictable delay.