Chinese Computing

When comtemplating localizing an application into Chinese, one of the first things you would want to look at is what market you are targeting. The People's Republic of China (mainland China) uses a variant of written Chinese called simplified characters that is represented by the GB-2312 character set, while Hong Kong and Taiwan use traditional characters in the Big5 character set. Mainland China and Taiwan both speak the Mandarin dialect of Chinese, while Hong Kong mostly speaks Cantonese. Mandarin by far is the most prevalent dialect and is in the official dialect of Taiwan and mainland China, and is becoming more popular in Hong Kong also.

One of the main difficulties in working with China is that it uses two bytes to represent a character. Applications (especially editors) that always assume one byte per letter/character will not always work properly for Chinese. Chinese also does not have any concept of case, so programs should be careful with functions that change text case, which would actually garble the Chinese text. Chinese also does not use spaces in between works. Functions that try to tokenize a string based on spaces or punctuation will need to be adapted. Sorting is another process that needs some modification for Chinese.

A good option is to make the program code flexible by storing all language dependent text in resource files and then having the program load the appropriate resources depending on the locale the program is running in or that the user has selected. That way new languages are much easier to add in later. How this is done depends on the programming language and operating system you are dealing with.

For further information you may want to contact of these localization companies that can work with Chinese.

Further Information