CDAC-Inscript-Critique: Difference between revisions

From SMC Wiki
(comments on foreword)
No edit summary
Line 11: Line 11:
On page 2 case b is incorrect conclusion of the underlying problem. The problem described in case a and b is examples of ZWJ and ZWNJ use. This has absolutely no relation with what input method you use. The problem is when the fonts used for display uses different sequences for displaying the same conjunct. Encoding of Malayalam conjunct 'nta' is a classic example of this. Irrespective of the input methods used, be it a key board layout like inscript/lalitha or transliteration method like varamozhi or swanalekha the encoding ultimately depends on the fonts used by the user. Users of Microsoft's karthika font enters/encodes 'nta' as na + chandrakala + ZWJ + rra irrespective of the input methods used and users of other fonts following the rules of the language enters/encodes it as na + chandrakala + rra.
On page 2 case b is incorrect conclusion of the underlying problem. The problem described in case a and b is examples of ZWJ and ZWNJ use. This has absolutely no relation with what input method you use. The problem is when the fonts used for display uses different sequences for displaying the same conjunct. Encoding of Malayalam conjunct 'nta' is a classic example of this. Irrespective of the input methods used, be it a key board layout like inscript/lalitha or transliteration method like varamozhi or swanalekha the encoding ultimately depends on the fonts used by the user. Users of Microsoft's karthika font enters/encodes 'nta' as na + chandrakala + ZWJ + rra irrespective of the input methods used and users of other fonts following the rules of the language enters/encodes it as na + chandrakala + rra.


This is approach to solve the problem of multiple encoding by standardizing keyboard layout is destined to fail. Dual encoding is introduced by unicode for characters (basic and named sequences) included in its table, and by font creators for conjuncts in the private area (nta for example). Even if the proposed inscript layout changes are accepted, the users will ultimately decide which encoding to use based on their choice of unicode version and Malayalam font. If at all there is any chance of success, that would be allowing only one way of entering a particular character - but that would mean breaking compatibility with old version of inscript layout as well as different versions of unicode.
This is approach to solve the problem of multiple encoding by standardizing keyboard layout is destined to fail. Dual encoding is introduced by unicode for characters (basic and named sequences) included in its table, and by font creators for conjuncts in the private area (nta for example). Even if the proposed inscript layout changes are accepted, the users will ultimately decide which encoding to use based on their choice of unicode version and Malayalam font. If at all there is any chance of success, that would be allowing only one way of entering a particular character - but that would mean breaking compatibility with old version of inscript layout as well as different versions of unicode. Since the problem originates at unicode and in fonts, the soultion also has to happen at the same place.  


Some problems are enumerated correctly but the solutions does not address those or in many cases adds to the problem. Take the case of search, when one more method of encoding chillu character is added, the user will have to try searching all possible encodings to get desired results. What does "porting and transferring a document input by one mechanism across OS’s or even within the same OS." even mean? Does it mean exchanging documents encoded in variants of the standard? Even in that case the deciding factor is font, once the document is created, the role of input method ended. Exchange of data depends only on the underlying encoding - in this case unicode. The only question is about the variations like different nta or chillu (atomic or joiner formed), but that is processed by font.  
Some problems are enumerated correctly but the solutions does not address those or in many cases adds to the problem. Take the case of search, when one more method of encoding chillu character is added, the user will have to try searching all possible encodings to get desired results. What does "porting and transferring a document input by one mechanism across OS’s or even within the same OS." even mean? Does it mean exchanging documents encoded in variants of the standard? Even in that case the deciding factor is font, once the document is created, the role of input method ended. Exchange of data depends only on the underlying encoding - in this case unicode. The only question is about the variations like different nta or chillu (atomic or joiner formed), but that is processed by font.  


1) u+0D02 ം  (അനുസ്വാരം) - is present in the existing inscript map, in
1) u+0D02 ം  (അനുസ്വാരം) - is present in the existing inscript map, in the position of Normal x. But in the proposal document, there is a comment about it as, "Not present on current Inscript map. Placement according
the position of Normal x. But in the proposal document, they commented
to generic place of Anuswara".
about it as, "Not present on current Inscript map. Placement according
to generic place of Anuswara". I couldn't understand why they
commented it like that. Can anybody give an explanation? Similarly,


2) u+0D36 ശ - is present in the existing inscript map in the position
2) u+0D36 ശ - is present in the existing inscript map in the position of Normal-Shifted m. And it is a frequently used letter. The commented: "Newly added non-frequent. Placement generic to all languages" does not make sense.
of Normal-Shifted m. And it is a frequently used letter.
Then why they commented it as: "Newly added non-frequent. Placement
generic to all languages"?


3) u+0D4C ൌ - is present in the current map, in the position of normal
3) u+0D4C ൌ - is present in the current map, in the position of normal q, and which is in frequent use. But in the proposal, it is moved to the position of ext q. This change is unacceptable, as it will break compatibility with existing layout.
q, and which is in frequent use. But in the proposal, it is moved to
the position of ext q. This change is unacceptable, as it will lead to
4) u+0D57 ൗ - is introduced to the existing position (normal q) of the sign u+0D4C ൌ , which is a duplicate encoding of the sign ൌ. This change is unacceptable because, this will break the compatibility between the data entered using the existing keyboard layout and that entered with the proposed keyboard layout.
unnecessary difficulties while typing.


4) u+0D57 ൗ - is introduced to the existing position (normal q) of the
Note: The proposed changes mentioned as 3) and 4) in this note, violate the "Principle 8: Backward Compatibility", which the document offer in it's page 7.
sign u+0D4C ൌ , which is a duplicate encoding of the sign ൌ. This
change is unacceptable because, this will break the compatibility
between the data entered using the existing keyboard layout and that
entered with the proposed keyboard layout.


Note: The proposed changes mentioned as 3) and 4) in this note,
5) u+0D0C ഌ - was not mapped earlier in Inscript. Now it is mapped (in the position of Ext shifted f) and, that is a desirable move. Comment "Backward Compatibility" on this is incorrect.  
violate the "Principle 8: Backward Compatibility", which the document
offer in it's page 7.


5) u+0D0C ഌ - was not mapped earlier in Inscript. Now it is mapped (in
6) u+0D62  ൢ- as in കഌ - was also not mapped earlier in Inscript. It also is mapped in the position of Ext f.
the position of Ext shifted f) and, that is a desirable move. I don't
know what they mean by their comment "Backward Compatibility" on this.


6) u+0D62  ൢ- as in കഌ - was also not mapped earlier in Inscript. It
Note: Considering the changes 5) and 6) in this note, their positions should be arranged more conveniently and logically. Because, while typing official documents in Malayalam, words like സംഘം, കഌപ്തം (company limited), കഌപ്തപ്പെടുത്തല്‍ (limiting), കഌപ്തസംഖ്യ (limited amount) etc appear often. So it is desirable to be placed in 2nd layer, rather than in the extended layer.
also is mapped in the position of Ext f.


Note: Considering the changes 5) and 6) in this note, I feel that
7) u+0D61 ൡ is proposed in the position of Ext shifted r, and u+0D63 ൣis proposed in the position of Ext r, which were not present in existing Inscript layout. Even if this letter is not present in frequent use, the new inclusion is desirable.
their positions should be arranged more conveniently and logically.
Because, while typing official documents in Malayalam, words like സംഘം
കഌപ്തം (company limited), കഌപ്തപ്പെടുത്തല്‍ (limiting), കഌപ്തസംഖ്യ
(limited amount) etc appear often. So it is desirable to be placed in
2nd layer, rather than in the extended layer.


7) u+0D61 ൡ is proposed in the position of Ext shifted r, and u+0D63
8) u+0D79 ൹- Malayalam date mark - mapped on Ext-shifted v. It should be there in the second layer, as it is frequent in usage, especially while drafting letters. It can be positioned on shifted v, as there are no letters are mapped in shift v position, in existing Inscript layout.
ൣis proposed in the position of Ext r, which were not present in
existing Inscript layout. Even if this letter is not present in
frequent use, the new inclusion is desirable.


8) u+0D79 ൹- Malayalam date mark - mapped on Ext-shifted v. It should
9) Positions proposed for all the chillus in the new proposal, are un-acceptable. Chillaksharams are very frequent in usage, and are already mapped on existing inscript keyboard layout. Proposed positions differ largely from the existing layout. Starting from primary school students, employees in government offices, people in the DTP industry, all are familiar with the existing logical layout, regarding the mapping of chillaksharams. The new proposal breaks the
be there in the second layer, as it is frequent in usage, especially
backward compatibility it offer, as mentioned in the note for point 4) in this description. Besides,
while drafting letters. It can be positioned on shifted v, as there
are no letters are mapped in shift v position, in existing Inscript
layout.


9) Positions proposed for all the chillus in the new proposal, are
10) The position of normal \ is proposed for u+0D7C ( Atomic chillu RR ), where the zwnj is mapped in the existing inscript keyboard layout. This change make no sense and thus, unacceptable.  
un-acceptable. Chillaksharams are very frequent in usage, and are
already mapped on existing inscript keyboard layout. Proposed
positions differ largely from the existing layout. Starting from
primary school students, employees in government offices, people in
the DTP industry, all are familiar with the existing logical layout,
regarding the mapping of chillaksharams. The new proposal breaks the
backward compatibility it offer, as mentioned in the note for point 4)
in this description. Besides,


10) The position of normal \ is proposed for u+0D7C ( Atomic chillu RR
11) In the proposal document, u+0D7D ( Atomic chillu L ) is proposed to be in the position of normal shifted fullstop. This position is occupied by the symbol > which is frequently used in mathematics. This change will make unnecessary difficulty for the school children, who are already familiar with the existing layout. Again, it breaks the backward compatibility.
), where the zwnj is mapped in the existing inscript keyboard layout.
This change make no sense and thus, unacceptable.


11) In the proposal document, u+0D7D ( Atomic chillu L ) is proposed
12) Similarly, u+0D7E ( Atomic chillu LL ) is proposed in the position of normal shifted 8 where * lies. Again, it is a frequently used symbol in mathematics, as well as in programming. This change will make unnecessary difficulty especially for the school children.
to be in the position of normal shifted fullstop. This position is
occupied by the symbol > which is frequently used in mathematics. This
change will make unnecessary difficulty for the school children, who
are already familiar with the existing layout. Again, it breaks the
backward compatibility.


12) Similarly, u+0D7E ( Atomic chillu LL ) is proposed in the position
13) No comments are done, why ക്‍ excluded. It is already mapped in existing inscript keyboard layout.
of normal shifted 8 where * lies. Again, it is a frequently used
symbol in mathematics, as well as in programming. This change will
make unnecessary difficulty especially for the school children.


13) No comments are done, why ക്‍ excluded. It is already mapped in
14) The comment column for atomic chillus in the inscript keyboard layout, are filled with the sentence "Placed as per Kerala Govt. Gazette (Vol.46 Thiruvananthapuram Dated. 18th Dec. 2001 No.2023) G.O. (Rt) No. 93/2001/ITD. dated 2-6-2001"
existing inscript keyboard layout.


14) The comment column for atomic chillus in the inscript keyboard
We all know that atomic chillus are introduced in Unicode 5.1, without considering the factual defenses by the people who really know Malayalam. This version of Unicode was released in April 2008. Then how the government order which was released much earlier (in June 2001) proposed for mapping atomic chillus on keyboard? We don't think that the Hon'ble Secretary to the Government foreseen the future inclusion of these code points, and issued an order. I couldn't
layout, are filled with the sentence "Placed as per Kerala Govt.
understand its logic. This needs explanation.
Gazette (Vol.46 Thiruvananthapuram Dated. 18th Dec. 2001 No.2023) G.O.
(Rt) No. 93/2001/ITD. dated 2-6-2001"


We all know that atomic chillus are introduced in Unicode 5.1, without
Government of Kerala has a Malayalam Computing project right now.
considering the factual defenses by the people who really know
Malayalam. This version of Unicode was released in April 2008. Then
how the government order which was released much earlier (in June
2001) proposed for mapping atomic chillus on keyboard? I don't think
that the Hon'ble Secretary to the Government foreseen the future
inclusion of these code points, and issued an order. I couldn't
understand its logic. Please explain anybody who know about it.


As far as I know, Government of Kerala has a Malayalam Computing
These comments on Malayalam inscript, are based on the keyboard layout released in the website of this project. It is here:
project right now.
My comments on Malayalam inscript, are based on the keyboard layout
released in the website of this project. It is here:


http://malayalam.kerala.gov.in/images/7/78/Inscript.jpg
http://malayalam.kerala.gov.in/images/7/78/Inscript.jpg


I am curious to know what really is there in that G.O. If anybody have
It would be useful in this discussion to get a copy of this GO.
a copy, please post it to the list.


If I am wrong somewhere, please correct me.
15) Proposed mapping of ZWJ (shift 9) and ZWNJ (shift 0) are also questionable, as they are already mapped in the positions of \ and ] respectively. This change breaks the backward compatibility.


15) Proposed mapping of ZWJ (shift 9) and ZWNJ (shift 0) are also
16) Regarding the Caps lock key. The document says: "After due discussions with OS developers it was decided that the Toggling key between layer 1 and layer 3 will be different for different Operating Systems. Though toggling between layer 1 and layer 2 (which is English layer) will remain through the Caps-Lock key as mentioned in Annexure-D of “Bureau of Indian Standard document for ISCII-91” which is as follows: “The Inscript overlay gets selected when Caps-Lock is
questionable, as they are already mapped in the positions of \ and ]
active, otherwise normal lower case English overlay gets selected.” "
respectively. This change breaks the backward compatibility.


16)  Regarding the Caps lock key. The document says: "After due
This logic is not easy to understand, because, Bureau of Indian Standard document for ISCII-91 is an age old document. It was prepared in a period when the user circle was much small, and facilities available in computers were less comparing today. It should be a nice option to let the user to customise the toggling keys. GNU/Linux already has this facility.
discussions with OS developers it was decided that the Toggling key
between layer 1 and layer 3 will be different for different Operating
Systems. Though toggling between layer 1 and layer 2 (which is English
layer) will remain through the Caps-Lock key as mentioned in
Annexure-D of “Bureau of Indian Standard document for ISCII-91” which
is as follows: “The Inscript overlay gets selected when Caps-Lock is
active, otherwise
normal lower case English overlay gets selected.” "
 
I couldn't understand its logic, Because, Bureau of Indian Standard
document for ISCII-91 is an age old document.
It was prepared in a period when the user circle was much small, and
facilities available in computers were less comparing today. In my
personal opinion, it should be a nice option to let the user to
customise the toggling keys. In the OS which I use, I have this
facility.

Revision as of 12:47, 6 December 2009

Comments on the new Inscript keyboard layout proposal

Full Proposal Contributors: Jaisen Nedumpala, Praveen Arimbrathodiyil

Draft

Input methods are very important component of any digital system. Any change has long reaching consequence in regards to compatibility with existing data. Blindly including all new characters of Unicode 5.1+ without a clear solution in regards to the issues brought by new versions is premature.

Foreword says "These new features had marked repercussions on storage as well as inputting and an urgent need was felt for a revision whereby each and very new character introduced in Unicode would be accommodated on the keyboard and a uniform manner of entering data as well as storing data would be devised." This is contradictory in itself, because, new characters introduced by Unicode version 5.1 and above has introduced multiple ways of entering and storing the same data. For example chillu characters can be entered two different ways, and 'nta' conjunct (named sequence) adds to the confusion of two existing ways of encoding nta (correct way based on language conjunct formation ie na + chandrakala + rra and incorrect way followed by Microsoft ie, na + chandrakala + ZWJ + rra). With the existing inscript keyboard layout, chillu characters are entered and stored in a uniform manner, these changes will allow entering and storing chillu characters in two different ways incompatible with each other. There is no mention about how to deal with the existing encoding of chillu character in Unicode 5.1 or how to deal with two different encoding of chillu characters.

On page 2 case b is incorrect conclusion of the underlying problem. The problem described in case a and b is examples of ZWJ and ZWNJ use. This has absolutely no relation with what input method you use. The problem is when the fonts used for display uses different sequences for displaying the same conjunct. Encoding of Malayalam conjunct 'nta' is a classic example of this. Irrespective of the input methods used, be it a key board layout like inscript/lalitha or transliteration method like varamozhi or swanalekha the encoding ultimately depends on the fonts used by the user. Users of Microsoft's karthika font enters/encodes 'nta' as na + chandrakala + ZWJ + rra irrespective of the input methods used and users of other fonts following the rules of the language enters/encodes it as na + chandrakala + rra.

This is approach to solve the problem of multiple encoding by standardizing keyboard layout is destined to fail. Dual encoding is introduced by unicode for characters (basic and named sequences) included in its table, and by font creators for conjuncts in the private area (nta for example). Even if the proposed inscript layout changes are accepted, the users will ultimately decide which encoding to use based on their choice of unicode version and Malayalam font. If at all there is any chance of success, that would be allowing only one way of entering a particular character - but that would mean breaking compatibility with old version of inscript layout as well as different versions of unicode. Since the problem originates at unicode and in fonts, the soultion also has to happen at the same place.

Some problems are enumerated correctly but the solutions does not address those or in many cases adds to the problem. Take the case of search, when one more method of encoding chillu character is added, the user will have to try searching all possible encodings to get desired results. What does "porting and transferring a document input by one mechanism across OS’s or even within the same OS." even mean? Does it mean exchanging documents encoded in variants of the standard? Even in that case the deciding factor is font, once the document is created, the role of input method ended. Exchange of data depends only on the underlying encoding - in this case unicode. The only question is about the variations like different nta or chillu (atomic or joiner formed), but that is processed by font.

1) u+0D02 ം (അനുസ്വാരം) - is present in the existing inscript map, in the position of Normal x. But in the proposal document, there is a comment about it as, "Not present on current Inscript map. Placement according to generic place of Anuswara".

2) u+0D36 ശ - is present in the existing inscript map in the position of Normal-Shifted m. And it is a frequently used letter. The commented: "Newly added non-frequent. Placement generic to all languages" does not make sense.

3) u+0D4C ൌ - is present in the current map, in the position of normal q, and which is in frequent use. But in the proposal, it is moved to the position of ext q. This change is unacceptable, as it will break compatibility with existing layout.

4) u+0D57 ൗ - is introduced to the existing position (normal q) of the sign u+0D4C ൌ , which is a duplicate encoding of the sign ൌ. This change is unacceptable because, this will break the compatibility between the data entered using the existing keyboard layout and that entered with the proposed keyboard layout.

Note: The proposed changes mentioned as 3) and 4) in this note, violate the "Principle 8: Backward Compatibility", which the document offer in it's page 7.

5) u+0D0C ഌ - was not mapped earlier in Inscript. Now it is mapped (in the position of Ext shifted f) and, that is a desirable move. Comment "Backward Compatibility" on this is incorrect.

6) u+0D62 ൢ- as in കഌ - was also not mapped earlier in Inscript. It also is mapped in the position of Ext f.

Note: Considering the changes 5) and 6) in this note, their positions should be arranged more conveniently and logically. Because, while typing official documents in Malayalam, words like സംഘം, കഌപ്തം (company limited), കഌപ്തപ്പെടുത്തല്‍ (limiting), കഌപ്തസംഖ്യ (limited amount) etc appear often. So it is desirable to be placed in 2nd layer, rather than in the extended layer.

7) u+0D61 ൡ is proposed in the position of Ext shifted r, and u+0D63 ൣis proposed in the position of Ext r, which were not present in existing Inscript layout. Even if this letter is not present in frequent use, the new inclusion is desirable.

8) u+0D79 ൹- Malayalam date mark - mapped on Ext-shifted v. It should be there in the second layer, as it is frequent in usage, especially while drafting letters. It can be positioned on shifted v, as there are no letters are mapped in shift v position, in existing Inscript layout.

9) Positions proposed for all the chillus in the new proposal, are un-acceptable. Chillaksharams are very frequent in usage, and are already mapped on existing inscript keyboard layout. Proposed positions differ largely from the existing layout. Starting from primary school students, employees in government offices, people in the DTP industry, all are familiar with the existing logical layout, regarding the mapping of chillaksharams. The new proposal breaks the backward compatibility it offer, as mentioned in the note for point 4) in this description. Besides,

10) The position of normal \ is proposed for u+0D7C ( Atomic chillu RR ), where the zwnj is mapped in the existing inscript keyboard layout. This change make no sense and thus, unacceptable.

11) In the proposal document, u+0D7D ( Atomic chillu L ) is proposed to be in the position of normal shifted fullstop. This position is occupied by the symbol > which is frequently used in mathematics. This change will make unnecessary difficulty for the school children, who are already familiar with the existing layout. Again, it breaks the backward compatibility.

12) Similarly, u+0D7E ( Atomic chillu LL ) is proposed in the position of normal shifted 8 where * lies. Again, it is a frequently used symbol in mathematics, as well as in programming. This change will make unnecessary difficulty especially for the school children.

13) No comments are done, why ക്‍ excluded. It is already mapped in existing inscript keyboard layout.

14) The comment column for atomic chillus in the inscript keyboard layout, are filled with the sentence "Placed as per Kerala Govt. Gazette (Vol.46 Thiruvananthapuram Dated. 18th Dec. 2001 No.2023) G.O. (Rt) No. 93/2001/ITD. dated 2-6-2001"

We all know that atomic chillus are introduced in Unicode 5.1, without considering the factual defenses by the people who really know Malayalam. This version of Unicode was released in April 2008. Then how the government order which was released much earlier (in June 2001) proposed for mapping atomic chillus on keyboard? We don't think that the Hon'ble Secretary to the Government foreseen the future inclusion of these code points, and issued an order. I couldn't understand its logic. This needs explanation.

Government of Kerala has a Malayalam Computing project right now.

These comments on Malayalam inscript, are based on the keyboard layout released in the website of this project. It is here:

Inscript.jpg

It would be useful in this discussion to get a copy of this GO.

15) Proposed mapping of ZWJ (shift 9) and ZWNJ (shift 0) are also questionable, as they are already mapped in the positions of \ and ] respectively. This change breaks the backward compatibility.

16) Regarding the Caps lock key. The document says: "After due discussions with OS developers it was decided that the Toggling key between layer 1 and layer 3 will be different for different Operating Systems. Though toggling between layer 1 and layer 2 (which is English layer) will remain through the Caps-Lock key as mentioned in Annexure-D of “Bureau of Indian Standard document for ISCII-91” which is as follows: “The Inscript overlay gets selected when Caps-Lock is active, otherwise normal lower case English overlay gets selected.” "

This logic is not easy to understand, because, Bureau of Indian Standard document for ISCII-91 is an age old document. It was prepared in a period when the user circle was much small, and facilities available in computers were less comparing today. It should be a nice option to let the user to customise the toggling keys. GNU/Linux already has this facility.