Chinese Computing

Creating Chinese Web Pages

HTML Issues

Chinese Character Sets

Two major computer encodings are used for Chinese: Big5 and GuoBiao (GB). Big5 encodes traditional characters and is used in Hong Kong and Taiwan, while GB encodes simplified characters and is used in mainland China and Singapore. Another encoding, Unicode, can be used most of the world's major languages, including both simplified and traditional Chinese. Unicode is supported by both Netscape and Internet Explorer and is growing in popularity.

Which encoding is used depends on your target audience. Many web sites offer the same information in several different encodings. Some also skip the problem altogether by putting all characters up as pictures so that they can be viewed on all browsers. This approach has the advantage of not requiring anything of the browser viewing the web page, but has two major disadvantages: it can be difficult to make changes to the content of the page; and the pictures can add a significant amount of time to download. Again, the needs of the audience and the goal of the website need to be considered when deciding on an approach.

Automatically Choosing the Encoding

The two major browsers both have the ability to display Chinese themselves, as long as a Chinese font is available on the system (see "Viewing Chinese Web Pages" below). It is possible for the web page to tell the browser what encoding the page uses. The browser can then use this information to properly display the web page. This is necessary because a browser that tries to display a page in Big5 as if it were GB will show incomprehensible garbage, and vice-versa.

The automatic method of choosing the character set is done by adding a META tag to the HTML of the web page. It is placed with the header, between <HEAD> and </HEAD>, in the same general place where the TITLE goes. This tag will look like the following:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=charset_name">
where charset_name can be gb2312, big5, utf-8.

Font Tags

HTML provides the FONT tag with a FACE attribute where the page can specify what font to use in displaying the tagged text. In cases where the specified font does not support Chinese text or the font itself does not exist on the user's system, this can cause the browser to incorrectly display the Chinese characters as boxes. Internet Explorer can substitute other fonts in this case, but Netscape Navigator 4 cannot. In order to make web pages viewable on the maximum number of browsers it is recommended not to specify what font to use on pages with Chinese text, but to instead let the browser use its default Chinese font. Be aware that some web page creation software such as Microsoft's FontPage will add this font face information in automatically.

Users who are trying to view Chinese on pages with this problem can override the font face setting in Netscape Navigator by going to Preferences, selecting Fonts under Appearance, and then selecting "Use my default fonts, overriding document-specified fonts". Netscape 6 will no longer have this problem.

Spaces between Characters

The early versions of Netscape and Internet Explorer were biased towards English and did not include any support for East Asian Languages, including Chinese. These early browsers expected to find spaces to use to break the text into words and lines. Since Chinese does not use spaces, the browser could possibly break up the Chinese text in inappropriate places, including in the middle of a double-byte character, or not break the line at all, leaving a large run of text off beyond the edge of the window. To aid in the proper formatting of Chinese text, many web pages would add a space in between each character. Then, even if the browser did not know how to handle Chinese, it would still find appropriate places to break it into lines.

With the advent of later versions of web browsers that can understand the proper means of displaying Chinese text, this is no longer necessary and can even make the text harder to read. But users creating web pages that may need to be displayed on older browsers or on browsers that still expect only English should be aware of this technique.

Ruby Annotations

Ruby refers to an annotation above a word or words in the main body of the text. It is used in Japanese writing to include the kana pronunciation for one or more kanji. It can also be used to add pinyin above a Chinese character or in conjunction with vertical text support to add bopomofo next to a Chinese character. MS Internet Explorer 5.0 and above supports the use of the RUBY tag.

Vertical Text

Starting with MS Internet Explorer 5.5, web pages will have the ability to display vertical text in the traditional format formerly used in China and still currently used in Taiwan. Chinese characters will still appear in the upright position, but English and other text will be rotated 90 degrees clockwise.

International Layout in HTML

Official CSS International Layout Description Page

Using Chinese with ASP

Web Sites with Multiple Languages (scroll down a few pages)

Web Page Creation Software

Specifying Page Language

When using a helper program to design web pages, it is also possible to specify what language the web page is in and what language it should be displayed in. The helps the browser properly show your web page. Here is how to set the language property for several web page design programs:

  1. Front Page Express: From the main menu select File, then Page Properties. In the dialog that appears, there will be a box that selects "HTML Encoding". For displaying and saving the page select the appropriate encoding: Simplified Chinese (GB2312), Traditional Chinese (Big5), or Mutlilingual (UTF-8).
  2. Netscape Composer: From the main menu select View, Character Set, and finally the appropriate encoding: Traditional Chinese (Big5), Simplfied Chinese (GB2312), or Unicode (UTF-8).
  3. Dreamweaver: Select Modify, Page Properties, and then the appropriate encoding.

Avoiding Ampersand Escapes

Chinese on the computer uses a computer encoding outside the normal ASCII range used by English. Web editors can mistake the bytes used for Chinese for other bytes used to create special characters like the copyright sign, etc. and escape the bytes using special ampersand escape sequences. Chinese web page creators should be aware of this. Even though pages created this way may be viewable to the end user, web page creators should make sure their pages do not encode the Chinese bytes in this way. A Perl ampersand escape to byte converter is available.

Links

Viewing Chinese Web Pages

Windows

Both of the major browsers can support Chinese without any other programs. All you need is the right font, and there are many good free fonts you can download. The best method is to download Microsoft's free language packs and input methods for Simplified and Traditional Chinese.

Installing these language packs will automatically set up Internet Explorer for Chinese. Netscape still needs one more step. From Netscape's main menu, select "Edit", then "Preferences". In the window that appears, select "Appearance" and "Fonts". First select "Simplified Chinese" for the encoding, and choose "MS Song" or "MS Hei" for the proportional and fixed length fonts. For the "Traditional Chinese Encoding", select "MingLiU" as the font. Selecting a larger font size might also be easier on your eyes.

Now as you surf around different Chinese websites, two situations may occur. Some web pages "know" that they are in Chinese, and the browser automatically knows to use the Chinese fonts to display them. For web pages that do not have this information, you can manually change to Chinese. On Netscape, this is done from "View" and then "Character Set" on the main menu. On Internet Explorer, this can be done from "View" and then "Fonts".

These fonts will also allow you to read (in Netscape Messager and Outlook) and write (in Outlook) Chinese in e-mails.

The are other fonts you can use on Windows instead of the Microsoft fonts. One possibility is the Bitstream Cyberbit font. The above method should also work with browsers on other operating systems after obtaining Chinese fonts. See the fonts section on this website for more information.

Macintosh

Unix

In Pictures

  • Replace your_url with the address of the Chinese web page you want to view:
    http://www.freeworld-2000.com/cgi-bin/unitext.cgi?in=tw&out=tw&url=http://your_url
  • OK88 Chinese Page Viewer

More Links