National Education Policy, inter alia, refers to India’s extremely rich literature in classical languages like Sanskrit, Tamil, Telugu, Kannada, Malayalam, Odia and emphasises on the need for preservation of the priceless works along with the immortal works of other languages like Pali, Persian, Prakrit for enrichment of posterity and hopes that future generations would benefit from such classical literature. Such resources will also be widely available in schools possibly as online modules through innovative approach.
The policy document describes how Sanskrit contains vast treasures of Mathematics, Medicine, Architecture, Philosophy, Grammar, Music , Metallurgy, Drama, Poetry and Story-telling and how this wide spectrum of knowledge is known as Sanskrit Knowledge System. The policy document says it would be offered at all levels of school and higher education as an important enriching option for students. But for this to happen, India needs to break the present meagre 0.01% Indian language content presence on the internet.
Why we have near nonexistent content presence on the internet needs a discussion. In 2020, Indian language computing completed the 50th year. The first integrated Devanagari computer was developed in 1983. Indian scripts encoding standard ISCII (Indian Script Code for Information Interchange) along with keyboard layout standard INSCRIPT were officially released by BIS (Bureau of Indian Standards) in 1988. This standard was developed by a group of scientists at IIT Kanpur who studied all major Indo-Aryan scripts, their behavior, the needs for computing and evolved the standard over a period of one and half decades. The ISCII standard document released by BIS was not a list of characters alone. It covered every aspect of the script properties, outlined the principles and rules that must govern the script behaviors in computing such that the implementation is unambiguous and efficient.
Text display is the technology needed to make the digital representation into visual readable representation. The font display formats are defined by Microsoft and Adobe and released as the OpenType standard. This standard does not implement Odia script rules and therefore, creates ambiguous and noisy rendering. For example, କ+6Ol is same as କ+6O+Ol. This results in a lot of ambiguous text creation which is not searchable, sortable or processed by algorithms for machine learning or text processing. Moreover, the format is so complex that even Adobe's own PDF software doesn't implement it. That makes all Odia PDF documents nonstandard and non-searchable and non-quotable. The complexity of this font format also made it impossible for calligraphers to design fonts for Odia, due to which, the publishers cannot create searchable digital content. There are hardly any Indian language publishers to be found. Those who exist, use legacy nonstandard software.
Odia character set needed standardisation. The Odia support we see today, however, is governed by the Unicode consortium and the state of Odisha has no guidelines. While Odia is taught and learnt the same way for all students of Odia in schools, what gets implemented on computers is different. The number of characters in Odia Unicode over the years has been changing. They make it to the software we use on our phones and computers. But, the students of today learning from Barnabodha or their teachers, parents learnt the same Odia letters that didn't have to evolve; whereas Odia users on computers and phones confuse with characters in Unicode every day.
We may not be in a position to "or" may not gather enough motivation to change the input hardware (the keyboard) which had been designed for English language only and has been then adopted for other languages across the world. However, comparing English typing with Odia, is an unfair comparison. Today, in the field of writing, Odia users are a lot more comfortable with pen and paper than a keyboard whereas, it is the opposite with English users. So, it will be natural that Odia use will remain confined to paper and pen.
Since OpenType format was designed for use with Unicode text and Unicode omitted the properties and characteristics of the characters encoded, OpenType failed to make an unambiguous definition of the rules to be applied. That made the design of fonts in different languages extremely complex and still unreliable.
Odia writing is learnt in 3 steps. The Barnamala, the matras and the Juktakshars. How these three steps translate into "typing" needs to be taught and practised. Since we do not have digital literacy in schools nor do we have a standard input method for mobile phones or tablets for Odia, all users are forced to look at a QWERTY English keyboard and try to figure out how to type Odia. A native Odia keyboard on mobile or tablets also does not implement the three steps.
India had started creating digital content since late 1980s and grew to an entirely digital publishing in a decade. Internet came in 1995 but didn't support Indian languages for about 15 years. All legacy content became incompatible with Unicode. So, we have little content on the internet.
It is necessary to effect necessary architectural changes in the existing Operating Systems, so that indigenous innovation in language tools like spell checkers/ grammar checkers can be freely integrated. This will also facilitate larger proliferation of Indian languages’ content on the internet. This is also the way to preserve our languages.
This very important matter has been badly delayed. It is sheer fallacy to access the Knowledge domain on the Internet only through English language. This will deprive millions of Indians of advantage of Internet through their own language. Surely none would like that. It is time State Government and Government of India jointly examined the issues involved and took up the matter with Unicode for a satisfactory resolution of the technical issues.
(DISCLAIMER: This is an opinion piece. The views expressed are the author’s own and have nothing to do with OTV’s charter or views. OTV does not assume any responsibility or liability for the same. The author can be reached at firstname.lastname@example.org)