Chinese Computing

Chinese in Java

Sections

Chinese Supplement to JDK2.0

The Java2 JDK now comes with a Chinese localized supplement.

Java's site also has a Chinese glossary of Java terms.

Compiling Java Source Files Containing Chinese

Java works internally with Unicode, so when compiling source code files that used a Chinese encoding such as Big5 or GB2312, you need to specify the encoding to the compiler in order to properly convert it to Unicode.

javac -encoding big5 sourcefile.java
or
javac -encoding gb2312 sourcefile.java

Loading GB or Big5 files

Java comes with a classes called InputStreamReader and OutputStreamWriter that translate into and out of Unicode from local encodings. Fortunately, two of the supported encodings are GB2312 (used in mainland China and Singapore) and Big5 (used in Hong Kong and Taiwan). Below is a sample program that converts a GB2312 file to UTF-8. It is derived from a sample program in Java in a Nutshell.

import java.io.*;

public class inputtest {
  
  public static void main(String[] args) {
    String outfile = null;

    try { convert(args[0], args[1], "GB2312", "UTF8"); } // or "BIG5"
    catch (Exception e) {
      System.out.print(e.getMessage());
      System.exit(1);
    }
  }

  public static void convert(String infile, String outfile, String from, String to) 
       throws IOException, UnsupportedEncodingException
  {
    // set up byte streams
    InputStream in;
    if (infile != null) in = new FileInputStream(infile);
    else in = System.in;
    OutputStream out;
    if (outfile != null) out = new FileOutputStream(outfile);
    else out = System.out;

    // Use default encoding if no encoding is specified.
    if (from == null) from = System.getProperty("file.encoding");
    if (to == null) to = System.getProperty("file.encoding");

    // Set up character stream
    Reader r = new BufferedReader(new InputStreamReader(in, from));
    Writer w = new BufferedWriter(new OutputStreamWriter(out, to));

    // Copy characters from input to output.  The InputStreamReader
    // converts from the input encoding to Unicode,, and the OutputStreamWriter
    // converts from Unicode to the output encoding.  Characters that cannot be
    // represented in the output encoding are output as '?'
    char[] buffer = new char[4096];
    int len;
    while((len = r.read(buffer)) != -1) 
      w.write(buffer, 0, len);
    r.close();
    w.flush();
    w.close();
  }

}

Displaying Chinese

Finding Chinese Fonts

Java 2 allows the programmer to directly access the fonts on the machine. The code sample below gets a list of all the fonts on the system, and then checks each font to see if it can display a sample Chinese string. Matching fonts are printed. Variations of the below code can be used to automatically find Chinese fonts and set the font of the Swing components accordingly. The method canDisplayUpTo seems to have changed since 1.2. This sample works for Java 1.4.

	// Determine which fonts support Chinese here ...
        Vector chinesefonts = new Vector();
	Font[] allfonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
	int fontcount = 0;
	String chinesesample = "\u4e00";
	for (int j = 0; j < allfonts.length; j++) {
	    if (allfonts[j].canDisplayUpTo(chinesesample) == -1) { 
	        chinesefonts.add(allfonts[j].getFontName());
	    }
  	    fontcount++;
	}

font.properties

Previous to the introduction of Swing set of peerless Java AWT components, Java could not display Chinese except on Chinese operating systems. With Swing, you can display Chinese in any component, providing you have fonts that support Chinese on your system. Previously, to find a font that could display Chinese you needed to modify a file called font.properties to list the available Chinese fonts. This is not a simple process for the average user and I think the above code is easier to use. But in case you do need to modify font.properties, here is an excerpt from my font.properties file, where Bitstream Cyberbit is the Unicode font. A list of Unicode fonts supporting Chinese can be found here.

dialog.0=Arial,ANSI_CHARSET
dialog.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialog.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialog.3=Bitstream Cyberbit

dialoginput.0=Courier New,ANSI_CHARSET
dialoginput.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.3=Bitstream Cyberbit

serif.0=Times New Roman,ANSI_CHARSET
serif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
serif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
serif.3=Bitstream Cyberbit

sansserif.0=Arial,ANSI_CHARSET
sansserif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.3=Bitstream Cyberbit

monospaced.0=Courier New,ANSI_CHARSET
monospaced.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.3=Bitstream Cyberbit

Java 1.2 includes Unicode fonts. Unfortunately, these fonts do not support Chinese, Japanese, or Korean yet. For more general information on fonts and Java, visit this programmer's page.

Chinese and Swing

Swing components can display any Unicode character that you have the font for. Here is a sample program that reads a GB file and displays it in a JTextArea.

import java.lang.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import java.util.*;

import javax.swing.*;

public class swingsample extends JFrame {
    private static JTextArea mTextArea;

    public swingsample(String filename) {
	super("GB File Viewer");
	createUI();
	try {
	    loadfile(filename, "GB2312");  // or "BIG5"
	} 
	catch (Exception loadexc) {
	}
	setVisible(true);
    }

    public static void loadfile(String filename, String enc)
	throws IOException, UnsupportedEncodingException
    {
	String newline;
	String buffer;
	InputStream in;
	
	newline = System.getProperty("line.separator");
	
	in = new FileInputStream(filename);
	// Set up character stream
	BufferedReader r = new BufferedReader(new InputStreamReader(in, enc));
	while ((buffer = r.readLine()) != null) {
	    mTextArea.append(buffer + newline);
	}
	r.close();
    }

    protected void createUI() {
	setSize(500, 500);
	Container content = getContentPane();
	content.setLayout(new BorderLayout());
	
	mTextArea = new JTextArea();
	//mTextArea.setFont(new Font("Bitstream Cyberbit", Font.PLAIN, 12));
	JScrollPane scrollPane = new JScrollPane(mTextArea,
						 JScrollPane.VERTICAL_SCROLLBAR_ALWAYS,
						 JScrollPane.HORIZONTAL_SCROLLBAR_ALWAYS);
	content.add(scrollPane, BorderLayout.CENTER);
	
	// Exit the application when the window is closed.
	addWindowListener(new WindowAdapter() {
	    public void windowClosing(WindowEvent e) {
		System.exit(0);
	    }
	});
    }

    public static void main(String[] args) {
	new swingsample(args[0]);
    }
    
} // swingsample

Chinese Resource Files

If you want to use local specific resource files for Chinese speaking areas (e.g. zh, zh_CN, zh_TW, etc.) then you can't just use GB, Big5 or some other normal Chinese encoding. You can create the resource file(s) using GB, Big5, etc. but you must then convert the file to use the \uXXXX Unicode escape notation. This is easily done with the native2ascii tool included with the JDK.

Inputting Chinese

Java 1.2 comes with a set of classes for interacting with the operating system's built-in input methods. Also, as of version 1.3 Java supports input methods that are independent of the OS. For more information on this, visit Sun's manual on using input methods.

Using the tutorial on the JavaSoft website as a guide, I've created six types of Chinese input methods that any program that runs with JDK1.3 can use. After downloading the jar file with the input method, copy it into the lib/ext directory of the your JDK1.3 or JRE1.3. Then you will be able to use them in any Java application. You will need to set your font.properties file to include a Chinese font for the characters to appear properly in the selection box.

Another possible way to input Chinese on Microsoft Windows is to use Microsoft's own Chinese input methods. However, depending on the computer, the characters may appear as question marks in the text field. This is a bug in current implementations in Java, but the next release of Java, 1.3.1, should fix it. The pure Java input methods below do not have this problem. One other solution for users of Windows 2000 is to switch you default locale to traditional or simplified Chinese.

To activate these input methods in Windows, click on the control box in the upper left-hand corner. One of the options will be to select input methods. You can then choose which of the installed input methods you want to use. Start typing pinyin. A box will appear beneath the current position with ten matching characters. To select one either type the number for it, hit the space bar for the top character, or start typing the next pinyin sequence and the top character will automatically be selected. If the desired character is not in the list, use the period "." to move forward and the comma "," to move back.

These input methods are a work in progress and I plan to improve them and add new ones. Along those lines I have included the source code for the input methods in each jar file and am putting them out as free, open source programs. Please improve and use them in your own programs and send back the improvements to incorporate in future versions.

  • Source code for all input methods
  • Chinese Input Methods JAR File, includes
    Pinyin w/o Tones - Simplified and Traditional Characters
    Pinyin with Tones - Simplified and Traditional Characters
    Pinyin w/o Tones - Simplified Characters
    Pinyin with Tones - Simplified Characters
    Pinyin w/o Tones - Traditional Characters
    Pinyin with Tones - Traditional Characters

Chinese Java Links