Language Impediments to E-Governance – Problems and Solutions -II

Part-II: The Solution

– Dr U B Pavanaja

Introduction

In Part-I of this article we have seen how hacked proprietary font based solutions hampered the growth of Indian language solutions on computers. All these solutions added a layer to the Operating System (OS) and did not gel well with the OS. All over the world computer users have started adapting the 16-bit data-encoding standard called Unicode. Systems that have in-built support for Unicode at the OS level need not have an added layer for Indic support.

Unicode

Unicode is a 16-bit data-encoding standard. Unicode consortium is a worldwide body consisting of government and industry representatives. Almost all the scripts on the Earth are given a unique codepage in Unicode. Codepage contains codepoints, viz., Unicode values for the basic characters in that particular script. Indic scripts have their own codepages separated by an offset of 128 (decimal). Unicode is a data storage standard and not a font standard. Unicode has a separate table for sorting. Hence every language has its own sorting order. Marathi and Hindi use the same script, but their collation orders are different. This requirement is easily met by Unicode by the virtue of character and sorting tables being different. The data stored in Unicode is displayed using Opentype fonts. Opentype font has no limitation on the number of glyphs. If data is written by one software as Unicode data, then it can be easily read by another software. This is due to the fact that data is not written as font encoding but as Unicode data. There is also no limitation on the number of languages that can be used simultaneously apart from English.

Indic on computers

The main ingredients of Indic support on computers comprises of an input method editor (IME, or keyboard driver as popularly known), a font for displaying the data, a standardized data storage, and a collation scheme as per the Indian language sorting order. These four elements will actually enable the Indian language support on the system. There is a fifth component which will add an additional visual effect. That is the user interface (UI) in Indian language. These components need not be and should not be tied to each other. In other words, a user should have the freedom of choosing an IME from vendor X and the font from vendor Y. Unicode fulfills these requirements.

In Part-I of this article, five major requirements to be met by an Indian language solution for E-Governance project were given. As one can deduce, Unicode based solutions meet all these requirements. Microsoft is one of the early implementers of Unicode for Indian languages in their OS and other applications. Since more than 95% of the people in India use Microsoft OS and applications, let us consider the Indian language features of these.

Microsoft and Indic

Microsoft introduced Indian languages employing Unicode in Windows 2000. Two scripts, viz., Devanagari and Tamil were introduced in that. Windows XP has brought in more languages. Windows XP and 2003 have support for Hindi, Sanskrit, Marathi, Konkani, Nepali, Gujarati, Punjabi, Tamil, Telugu and Kannada. SP2 for Windows XP has added the support for Bengali and Malayalam. Office XP and 2003 have support for these languages as well.

Indian language support is built into the operating system itself and not an added layer. One can have even filenames in Indian languages. Windows has locale settings wherein one can have month names, weekdays, currency, etc. in Indian language. Keyboard drivers to input text in Indian languages are also available. Since most people are already familiar with many keyboard layouts, most of these popular keyboard layouts have been provided. For those who want to either change an existing keyboard layout or want to create a totally new layout, a keyboard layout creator is available for download. Browsing Indian language web-sites created using Unicode is made simple by Internet Explorer, the web-browser which supports Unicode.

The flagship Office suite from Microsoft, the latest version being 2003, consisting of Word, Excel, Outlook, PowerPoint, Access, FrontPage and other modules has full support for Indian languages employing Unicode. Word is used for documentation, publishing, etc. FrontPage is used for web-site building. Excel is a spreadsheet and Access is a database. Spell-check and Autocorrect features are now available for Indian languages as well. Word, Excel and PowerPoint have a new feature called Smart tags. Smart tags allow some word or phrase to be recognized and some pre-defined action to take place. A bank employee can use a banking smart tag, which will recognize a banking term entered in English and will replace it by an equivalent Indian language word. Since sorting is as per Indian language collation sequence, database applications are now possible. Since the data is stored in standard Unicode data storage format and not as non-standard font encoding, there is no fear of the data becoming obsolete over the years. For enterprise applications, there is SQL Server as the database server. For collaborative work and portals there is Sharepoint, which consists of portal server and team services. Workflow automation and paperless offices can be implemented using these.

Hindi and Tamil UI are available for Windows XP. This is actually a Language Interface Pack (LIP) wherein 20% of the commands which are used 80% of the time are translated into Hindi. Office suite is luckier. It has a complete Hindi version available now. Here the entire suite, including the UI, is in Hindi. Other Indian language LIPs for Windows XP and UIs for Office suite are also available.

Visual Studio .NET is the developmental suite from Microsoft. The fact that even Linux community is adapting dotnet shows its worthiness. VS.NET has support for Indian languages using Unicode. One can use VB.NET, VC++.NET, C#.NET, etc. to develop any application for the .NET framework. The application developed can be deployed on any system which has the dotnet framework installed. Since the environment is Unicode one can have Indian language text in Menus, textboxes, error messages, dialog boxes, etc.

Resources and links:-
1. Unicode Consortium
2. Unicode charts
3. Opentype fonts
4. Software globalization
5. Internationalization
6. Technology Development for Indian Languages
7. Bhasha India web-site
8. Microsoft India
9. Indian language application development employing .NET
10. Part-I of this article

Leave a Reply