My family’s native language, which I grew up speaking, is far from a niche language. Bengali is the seventh most common native language in the world, sitting ahead of the eighth (Russian) by a wide margin, with as many native speakers as French, German, and Italian combined.

And yet, on the Internet, Bengali is very much a second-class citizen – as are Arabic (#5), Hindi (#4), and Mandarin (#1) – any language which is not written with the Latin alphabet.

The very first version of the Unicode standard did include Bengali. However, it left out a number of important characters. Until 2005, Unicode did not have one of the characters in the Bengali word for “suddenly”. Instead, people who wanted to write this everyday word had to combine three separate, unrelated characters. For English-speaking teenagers, combining characters in unexpected ways, like writing ‘w’ as ‘//’, used to be a way of asserting technical literacy through “l33tspeak” – a shibboleth for nerds that derives its name from the word “elite”. But Bengalis were forced to make similar orthographic contortions just to write a simple email: ত + ্ + ‍ = ‍ৎ (the third character is the invisible “zero width joiner”).

Even today, I am forced to do this when writing my own name. My name is not only a common Indian name, but one of the top 1,000 names in the United States as well. But the final letter has still not been given its own Unicode character, so I have to use a substitute…

I am not the only one who has trouble writing their name correctly in Unicode. Linguistically, East Asian languages such as Chinese, Japanese, and Korean have distinct writing systems. Some (but not all) of the characters trace their lineage back to a common set, but even these characters, known as Han characters, began to diverge and evolve independently over two thousand years ago.

The Unicode Consortium has launched a very controversial project known as Han Unification: an attempt to create a limited set of characters that will be shared by these so-called “CJK languages.” Instead of recognizing these languages as having their own writing systems that share some common ancestry, the Han unification process views them as mere variations on some “true” form.

To help English readers understand the absurdity of this premise, consider that the Latin alphabet (used by English) and the Cyrillic alphabet (used by Russian) are both derived from Greek. No native English speaker would ever think to try “Greco Unification” and consolidate the English, Russian, German, Swedish, Greek, and other European languages’ alphabets into a single alphabet. Even though many of the letters look similar to Latin characters used in English, nobody would try to use them interchangeably. ҭЋаt ωoulδ βε σutragєѳuѕ.

Even though our language is exempt from this effort, Han unification is particularly troubling for Bengali speakers to hear about. The rhetoric is a blast from our own colonial past, when the British referred to Indian languages pejoratively as “dialects”. Depriving their colonial subjects of distinct linguistic identities was a key tactic in justifying their brutal rule over an “uncivilized” people.

I Can Text You A Pile of Poo, But I Can’t Write My Name – Aditya Mukerjee

(via

gothhabiba

)

The whole story is well worth a read; it’s jarring sometimes how these ‘antiquated’ ideas of linguistic colonialism still thrive in the digital age.

(via transliterations)

Leave a comment