Language Impediments to E-Governance – Problems and Solutions – I

Part-I: Problem definition

– Dr U B Pavanaja

Introduction

So much has been written about the digital divide in India. People have talked tirelessly about the need of taking the benefits of Information Technology (IT) for common man. No one is going to question the validity of the statement “IT for common man has to be in his language”. Many state governments and the central government of India are spending lot of money towards E-Governance. Let us consider the problems and their solutions in implementing the E-Governance projects in India.

E-Governance solutions can be divided into two segments. There are many intra-departmental activities that happen in all government offices. Huge databases exist in every office. These are used by officers who are well versed with English. Indian language (Indian language) need not be a component in the computerization of these activities. The second segment is where government deals with citizens. Be it land record, birth certificate or complaints and redressals, Indian language component has to exist in the computerization of G-to-C (Government to Citizen) activities.

There are two main impediments in the implementation of G-to-C projects in India. They are the mindset of the people involved and the technical problems involved in using Indian languages in IT. This article will be dealing with the second problem in detail. I am inviting discerning readers to address the first problem!

There are many Indian language solution vendors. Most of them use their own proprietary encodings hampering the portability of the data across different users and platforms. User is stuck with a particular vendor for life. The problem becomes manifold if there are multiple installations spread across different locations.

Solution Requirements

The five major requirements to be met by the Indian language solution are-

1. Portability of data. The data entered and stored in one system should be accessible and readable by every system in the enterprise. There should not be any filters and converters in all the systems to read this data.
2. Availability of data as data. Most Indian language vendors store the data not as storage code but as some proprietary font encodings. This kind of data storage is of no use other than for printing and publishing. No database programming is possible with this kind of data. Data keyed-in has to be stored in some sort of standard for Indian language data storage. ISCII and Unicode are the two such standards.
3. Longevity of data. The data stored should be useful for a long period. The data stored should not become obsolete over the years.
4. Universal acceptance of data. The encoding used for data storage should follow international standard. This becomes relevance in the future where handheld devices and thin-client computing becomes more popular. The devices that may appear in future should immediately able to understand the data. This is possible only if some international standard is followed.
5. Replicability. The solution created for one Indian language should be easily replicable for other languages and places. Even though some applications may not need more than one Indian language at a time, it is advisable to have the application to be easily modifiable to accommodate more languages. This is possible by employing a modular approach were the language specific information comes from resource files –one file per language.

Indian Languages on Computers

Indian languages have come a long way in computers. Our languages have a history of over two thousand years. About two decades ago Indian languages made their first appearance on computers. Those days computers were used mainly for printing and publishing. Typical application used to include some fonts, keyboard drivers and a page-composing package. There were not any database applications. After few years, database applications made their appearance. They were mainly based on 8-bit ISCII.

ISCII

ISCII (Indian Script Code for Information Interchange) is based on ASCII (American Standard Code for Information Interchange). Here lower ASCII, i.e., codes up to 127 are used for English. Upper ASCII, i.e., codes from 128 to 255 are used for one Indian language. This could accommodate one Indian language apart from English at a time. All Indian languages have been given one code in ISCII. That means the ISCII code for Kannada ka and Hindi ka are the same. There is no separate table for the sorting order. Sorting is supposed to be based on the presence of the characters in the ISCII table. All languages of India are supposed to follow one sorting order, which is not true in reality.

Solutions based on ISCII used to be dependent on Truetype fonts. Truetype fonts have limitation on the number of glyphs (individual display element is called glyph, and collection of glyphs makes a font). Indian languages have unique feature. Basic alphabet consists of consonants and vowels. Thousands of combinations of these consonants and vowels are possible. If we want to have a unique glyph in the font to display every such possible combination, then the total number of glyphs in that font would cross 15,000. Indian language vendors arrived at a compromise set of glyphs. They had around 150-190 glyphs for each language.

Problems with ISCII

Many vendors were providing Indian language applications and solutions based on Truetype fonts. There was no font standard for Indian language. ISCII is only a data storage standard and not a font standard. Majority of Indian language vendors were providing solutions where the data was written as font encoding and not as ISCII. This was hampering the propagation of Indian language usage on computers, as data written by one vendor’s software was not readable by another vendor’s software. Even though ISCII was present as data storage standard, there were not many vendors providing solutions based on ISCII. As already mentioned ISCII also had its own limitations like not able to accommodate more than one Indian language at a time and improper sorting.

Bottlenecks

Due to these problems many decision making people at various government agencies have resorted to the easier route of not using Indian language at all in their E-Governance projects. Some people have adopted even simpler solution of not using any computerization at all! There is another category of people who have stuck with the proprietary font based solution and are faced with the problems of not meeting the five major requirements mentioned above.

[http://vishvakannada.com/node/164|Part-II of this article] will deal with the solutions for these problems.

Resources and links:-
1. [http://www.cdac.in/html/gist/down/iscii_d.asp|ISCII Document]
2. [http://tdil.mit.gov.in|Technology Development for Indian Languages]
3. [http://www.bhashaindia.com|Bhasha India web-site]

Leave a Reply