CDAC-Inscript-Critique: Difference between revisions

From SMC Wiki
No edit summary
(comments on foreword)
Line 1: Line 1:
==Comments on the new Inscript keyboard layout proposal==
[http://groups.google.com/group/smc-discuss/attach/94da86c1ba474b0d/Inscript_Proposal_New.pdf?part=2 Full Proposal] Contributors: Jaisen Nedumpala, Praveen Arimbrathodiyil
'''Draft'''
Input methods are very important component of any digital system. Any change has long reaching consequence in regards to compatibility with existing data. Blindly including all new characters of Unicode 5.1+ without a clear solution in regards to the issues brought by new versions is premature.
Foreword says "These new features had marked repercussions on storage as well as inputting and an urgent need was felt for a revision whereby each and very new character introduced in Unicode would be accommodated on the keyboard and a uniform manner of entering data as well as storing data would be devised." This is contradictory in itself, because, new characters introduced by Unicode version 5.1 and above has introduced multiple ways of entering and storing the same data. For example chillu characters can be entered two different ways, and 'nta' conjunct (named sequence) adds to the confusion of two existing ways of encoding nta (correct way based on language conjunct formation ie na + chandrakala + rra and incorrect way followed by Microsoft ie, na + chandrakala + ZWJ + rra). With the existing inscript keyboard layout, chillu characters are entered and stored in a uniform manner, these changes will allow entering and storing chillu characters in two different ways incompatible with each other. There is no mention about how to deal with the existing encoding of chillu character in Unicode 5.1 or how to deal with two different encoding of chillu characters.
On page 2 case b is incorrect conclusion of the underlying problem. The problem described in case a and b is examples of ZWJ and ZWNJ use. This has absolutely no relation with what input method you use. The problem is when the fonts used for display uses different sequences for displaying the same conjunct. Encoding of Malayalam conjunct 'nta' is a classic example of this. Irrespective of the input methods used, be it a key board layout like inscript/lalitha or transliteration method like varamozhi or swanalekha the encoding ultimately depends on the fonts used by the user. Users of Microsoft's karthika font enters/encodes 'nta' as na + chandrakala + ZWJ + rra irrespective of the input methods used and users of other fonts following the rules of the language enters/encodes it as na + chandrakala + rra.
This is approach to solve the problem of multiple encoding by standardizing keyboard layout is destined to fail. Dual encoding is introduced by unicode for characters (basic and named sequences) included in its table, and by font creators for conjuncts in the private area (nta for example). Even if the proposed inscript layout changes are accepted, the users will ultimately decide which encoding to use based on their choice of unicode version and Malayalam font. If at all there is any chance of success, that would be allowing only one way of entering a particular character - but that would mean breaking compatibility with old version of inscript layout as well as different versions of unicode.
Some problems are enumerated correctly but the solutions does not address those or in many cases adds to the problem. Take the case of search, when one more method of encoding chillu character is added, the user will have to try searching all possible encodings to get desired results. What does "porting and transferring a document input by one mechanism across OS’s or even within the same OS." even mean? Does it mean exchanging documents encoded in variants of the standard? Even in that case the deciding factor is font, once the document is created, the role of input method ended. Exchange of data depends only on the underlying encoding - in this case unicode. The only question is about the variations like different nta or chillu (atomic or joiner formed), but that is processed by font.
1) u+0D02 ം  (അനുസ്വാരം) - is present in the existing inscript map, in
1) u+0D02 ം  (അനുസ്വാരം) - is present in the existing inscript map, in
the position of Normal x. But in the proposal document, they commented
the position of Normal x. But in the proposal document, they commented

Revision as of 15:08, 3 December 2009

Comments on the new Inscript keyboard layout proposal

Full Proposal Contributors: Jaisen Nedumpala, Praveen Arimbrathodiyil

Draft

Input methods are very important component of any digital system. Any change has long reaching consequence in regards to compatibility with existing data. Blindly including all new characters of Unicode 5.1+ without a clear solution in regards to the issues brought by new versions is premature.

Foreword says "These new features had marked repercussions on storage as well as inputting and an urgent need was felt for a revision whereby each and very new character introduced in Unicode would be accommodated on the keyboard and a uniform manner of entering data as well as storing data would be devised." This is contradictory in itself, because, new characters introduced by Unicode version 5.1 and above has introduced multiple ways of entering and storing the same data. For example chillu characters can be entered two different ways, and 'nta' conjunct (named sequence) adds to the confusion of two existing ways of encoding nta (correct way based on language conjunct formation ie na + chandrakala + rra and incorrect way followed by Microsoft ie, na + chandrakala + ZWJ + rra). With the existing inscript keyboard layout, chillu characters are entered and stored in a uniform manner, these changes will allow entering and storing chillu characters in two different ways incompatible with each other. There is no mention about how to deal with the existing encoding of chillu character in Unicode 5.1 or how to deal with two different encoding of chillu characters.

On page 2 case b is incorrect conclusion of the underlying problem. The problem described in case a and b is examples of ZWJ and ZWNJ use. This has absolutely no relation with what input method you use. The problem is when the fonts used for display uses different sequences for displaying the same conjunct. Encoding of Malayalam conjunct 'nta' is a classic example of this. Irrespective of the input methods used, be it a key board layout like inscript/lalitha or transliteration method like varamozhi or swanalekha the encoding ultimately depends on the fonts used by the user. Users of Microsoft's karthika font enters/encodes 'nta' as na + chandrakala + ZWJ + rra irrespective of the input methods used and users of other fonts following the rules of the language enters/encodes it as na + chandrakala + rra.

This is approach to solve the problem of multiple encoding by standardizing keyboard layout is destined to fail. Dual encoding is introduced by unicode for characters (basic and named sequences) included in its table, and by font creators for conjuncts in the private area (nta for example). Even if the proposed inscript layout changes are accepted, the users will ultimately decide which encoding to use based on their choice of unicode version and Malayalam font. If at all there is any chance of success, that would be allowing only one way of entering a particular character - but that would mean breaking compatibility with old version of inscript layout as well as different versions of unicode.

Some problems are enumerated correctly but the solutions does not address those or in many cases adds to the problem. Take the case of search, when one more method of encoding chillu character is added, the user will have to try searching all possible encodings to get desired results. What does "porting and transferring a document input by one mechanism across OS’s or even within the same OS." even mean? Does it mean exchanging documents encoded in variants of the standard? Even in that case the deciding factor is font, once the document is created, the role of input method ended. Exchange of data depends only on the underlying encoding - in this case unicode. The only question is about the variations like different nta or chillu (atomic or joiner formed), but that is processed by font.

1) u+0D02 ം (അനുസ്വാരം) - is present in the existing inscript map, in the position of Normal x. But in the proposal document, they commented about it as, "Not present on current Inscript map. Placement according to generic place of Anuswara". I couldn't understand why they commented it like that. Can anybody give an explanation? Similarly,

2) u+0D36 ശ - is present in the existing inscript map in the position of Normal-Shifted m. And it is a frequently used letter. Then why they commented it as: "Newly added non-frequent. Placement generic to all languages"?

3) u+0D4C ൌ - is present in the current map, in the position of normal q, and which is in frequent use. But in the proposal, it is moved to the position of ext q. This change is unacceptable, as it will lead to unnecessary difficulties while typing.

4) u+0D57 ൗ - is introduced to the existing position (normal q) of the sign u+0D4C ൌ , which is a duplicate encoding of the sign ൌ. This change is unacceptable because, this will break the compatibility between the data entered using the existing keyboard layout and that entered with the proposed keyboard layout.

Note: The proposed changes mentioned as 3) and 4) in this note, violate the "Principle 8: Backward Compatibility", which the document offer in it's page 7.

5) u+0D0C ഌ - was not mapped earlier in Inscript. Now it is mapped (in the position of Ext shifted f) and, that is a desirable move. I don't know what they mean by their comment "Backward Compatibility" on this.

6) u+0D62 ൢ- as in കഌ - was also not mapped earlier in Inscript. It also is mapped in the position of Ext f.

Note: Considering the changes 5) and 6) in this note, I feel that their positions should be arranged more conveniently and logically. Because, while typing official documents in Malayalam, words like സംഘം കഌപ്തം (company limited), കഌപ്തപ്പെടുത്തല്‍ (limiting), കഌപ്തസംഖ്യ (limited amount) etc appear often. So it is desirable to be placed in 2nd layer, rather than in the extended layer.

7) u+0D61 ൡ is proposed in the position of Ext shifted r, and u+0D63 ൣis proposed in the position of Ext r, which were not present in existing Inscript layout. Even if this letter is not present in frequent use, the new inclusion is desirable.

8) u+0D79 ൹- Malayalam date mark - mapped on Ext-shifted v. It should be there in the second layer, as it is frequent in usage, especially while drafting letters. It can be positioned on shifted v, as there are no letters are mapped in shift v position, in existing Inscript layout.

9) Positions proposed for all the chillus in the new proposal, are un-acceptable. Chillaksharams are very frequent in usage, and are already mapped on existing inscript keyboard layout. Proposed positions differ largely from the existing layout. Starting from primary school students, employees in government offices, people in the DTP industry, all are familiar with the existing logical layout, regarding the mapping of chillaksharams. The new proposal breaks the backward compatibility it offer, as mentioned in the note for point 4) in this description. Besides,

10) The position of normal \ is proposed for u+0D7C ( Atomic chillu RR ), where the zwnj is mapped in the existing inscript keyboard layout. This change make no sense and thus, unacceptable.

11) In the proposal document, u+0D7D ( Atomic chillu L ) is proposed to be in the position of normal shifted fullstop. This position is occupied by the symbol > which is frequently used in mathematics. This change will make unnecessary difficulty for the school children, who are already familiar with the existing layout. Again, it breaks the backward compatibility.

12) Similarly, u+0D7E ( Atomic chillu LL ) is proposed in the position of normal shifted 8 where * lies. Again, it is a frequently used symbol in mathematics, as well as in programming. This change will make unnecessary difficulty especially for the school children.

13) No comments are done, why ക്‍ excluded. It is already mapped in existing inscript keyboard layout.

14) The comment column for atomic chillus in the inscript keyboard layout, are filled with the sentence "Placed as per Kerala Govt. Gazette (Vol.46 Thiruvananthapuram Dated. 18th Dec. 2001 No.2023) G.O. (Rt) No. 93/2001/ITD. dated 2-6-2001"

We all know that atomic chillus are introduced in Unicode 5.1, without considering the factual defenses by the people who really know Malayalam. This version of Unicode was released in April 2008. Then how the government order which was released much earlier (in June 2001) proposed for mapping atomic chillus on keyboard? I don't think that the Hon'ble Secretary to the Government foreseen the future inclusion of these code points, and issued an order. I couldn't understand its logic. Please explain anybody who know about it.

As far as I know, Government of Kerala has a Malayalam Computing project right now. My comments on Malayalam inscript, are based on the keyboard layout released in the website of this project. It is here:

Inscript.jpg

I am curious to know what really is there in that G.O. If anybody have a copy, please post it to the list.

If I am wrong somewhere, please correct me.

15) Proposed mapping of ZWJ (shift 9) and ZWNJ (shift 0) are also questionable, as they are already mapped in the positions of \ and ] respectively. This change breaks the backward compatibility.

16) Regarding the Caps lock key. The document says: "After due discussions with OS developers it was decided that the Toggling key between layer 1 and layer 3 will be different for different Operating Systems. Though toggling between layer 1 and layer 2 (which is English layer) will remain through the Caps-Lock key as mentioned in Annexure-D of “Bureau of Indian Standard document for ISCII-91” which is as follows: “The Inscript overlay gets selected when Caps-Lock is active, otherwise normal lower case English overlay gets selected.” "

I couldn't understand its logic, Because, Bureau of Indian Standard document for ISCII-91 is an age old document. It was prepared in a period when the user circle was much small, and facilities available in computers were less comparing today. In my personal opinion, it should be a nice option to let the user to customise the toggling keys. In the OS which I use, I have this facility.