Non-Latin domain names coming soon

Wow, this’ll be a huge change. ICANN is getting ready to allow internationalized domain names — web-site domain names that use non-Latin character encodings instead of the regular western alphabet.

Internationalized Domain Names (IDNs) are domain names represented by local language characters. Such domain names could contain letters or characters from non-ASCII scripts (for example, Arabic or Chinese). Many efforts are ongoing in the Internet community to make domain names available in character sets other than ASCII.

These “internationalized domain name” (IDN) efforts were the subject of a 25 September 2000 resolution by the ICANN Board of Directors, which recognized “that it is important that the Internet evolve to be more accessible to those who do not use the ASCII character set,” and also stressed that “the internationalization of the Internet’s domain name system must be accomplished through standards that are open, non-proprietary, and fully compatible with the Internet’s existing end-to-end model and that preserve globally unique naming in a universally resolvable public name space.”

You can read this BBC story for the more easily digested version:

The internet is on the brink of the “biggest change” to its working “since it was invented 40 years ago”, the net regulator Icann has said.

The body said it that it was finalising plans to introduce web addresses using non-Latin characters.

The proposal — initially approved in 2008 — would allow domain names written in Asian, Arabic or other scripts.

My head is swimming with questions. A lot of them are about the specifics of getting this to work, considering how complex it still is to handle a lot of non-Latin writing systems with current fonts and technologies. There’s still a big divide between encoding the characters and representing them visually with a lot of scripts, and still only limited solutions for handling that. With Indic scripts, to pick an example I’ve been rather deeply immersed in lately, the way the Unicode-based characters are typed in is just the first step: rendering names or words in a way that makes linguistic sense requires a little extra software processing, and some carefully built fonts. So are the address bars in web browsers going to handle OpenType substitutions to make that happen? Or is there going to be a different encoding solution that’s a little more WYSIWYG when it comes to typing in non-Latin addresses? I’m guessing a lot of those issues have come up in the ICAAN proceedings, so I suppose it’s time to wade through them and see what the deal is.

In one sense, of course, this is brilliant. Domain names are part of our online identities and brands these days, and people should be able to use their own languages and writing systems to identify themselves online. It’s only fair, and it shows respect for the huge sectors of the world that don’t use our alphabet everyday. Hopefully this will also encourage more technical support — and type design,if we’re starting to think about web fonts, too — for non-Latin scripts. (Trust me, it’s a typographic desert out there in the non-Latin world.)

But there will also be a certain amount of balkanization that’s likely to come of it, on top of just the language barrier. Linking to non-Latin domain names will require extra know-how about how to key in those names. It will require some understanding of encoding versus representation, writing direction, and even sensitivity to the differences betwen one character and another in an unfamiliar alphabet. Again, these are things that would all be good for people to learn, but we can’t even get people to use nice, clean HTML all the time. It would be a shame if the extra complexity keeps people from bothering to connect to the internationalized portion of the web. I suppose some sort of transliteration layer will spring up, but again…so many questions!