Loading GB or Big5 files
Java comes with a classes called InputStreamReader and OutputStreamWriter that
translate into and out of Unicode from local encodings. Fortunately, two of the
supported encodings are GB2312 (used in mainland China and Singapore) and Big5
(used in Hong Kong and Taiwan). Below is a sample program that converts a GB2312
file to UTF-8. It is derived from a sample program in Java in a Nutshell.
import java.io.*;
public class inputtest {
public static void main(String[] args) {
String outfile = null;
try { convert(args[0], args[1], "GB2312", "UTF8"); } // or "BIG5"
catch (Exception e) {
System.out.print(e.getMessage());
System.exit(1);
}
}
public static void convert(String infile, String outfile, String from, String to)
throws IOException, UnsupportedEncodingException
{
// set up byte streams
InputStream in;
if (infile != null) in = new FileInputStream(infile);
else in = System.in;
OutputStream out;
if (outfile != null) out = new FileOutputStream(outfile);
else out = System.out;
// Use default encoding if no encoding is specified.
if (from == null) from = System.getProperty("file.encoding");
if (to == null) to = System.getProperty("file.encoding");
// Set up character stream
Reader r = new BufferedReader(new InputStreamReader(in, from));
Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
// Copy characters from input to output. The InputStreamReader
// converts from the input encoding to Unicode,, and the OutputStreamWriter
// converts from Unicode to the output encoding. Characters that cannot be
// represented in the output encoding are output as '?'
char[] buffer = new char[4096];
int len;
while((len = r.read(buffer)) != -1)
w.write(buffer, 0, len);
r.close();
w.flush();
w.close();
}
}
Displaying Chinese
Finding Chinese Fonts
Java 2 allows the programmer to directly access the fonts on the
machine. The code sample below gets a list of all the fonts on the
system, and then checks each font to see if it can display a sample
Chinese string. Matching fonts are printed. Variations of the below
code can be used to automatically find Chinese fonts and set the font
of the Swing components accordingly. The method canDisplayUpTo seems
to have changed since 1.2. This sample works for Java 1.4.
// Determine which fonts support Chinese here ...
Vector chinesefonts = new Vector();
Font[] allfonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
int fontcount = 0;
String chinesesample = "\u4e00";
for (int j = 0; j < allfonts.length; j++) {
if (allfonts[j].canDisplayUpTo(chinesesample) == -1) {
chinesefonts.add(allfonts[j].getFontName());
}
fontcount++;
}
font.properties
Previous to the introduction of Swing set of peerless Java AWT
components, Java could not display Chinese except on Chinese operating
systems. With Swing, you can display Chinese in any component,
providing you have fonts that support Chinese on your system.
Previously, to find a font that could display Chinese you needed to modify a file called
font.properties to list the available Chinese fonts. This is
not a simple process for the average user and I think the above
code is easier to use. But in case you do need to modify font.properties, here is an
excerpt from my font.properties file, where Bitstream Cyberbit
is the Unicode font. A list of Unicode
fonts supporting Chinese can be found
here.
dialog.0=Arial,ANSI_CHARSET
dialog.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialog.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialog.3=Bitstream Cyberbit
dialoginput.0=Courier New,ANSI_CHARSET
dialoginput.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.3=Bitstream Cyberbit
serif.0=Times New Roman,ANSI_CHARSET
serif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
serif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
serif.3=Bitstream Cyberbit
sansserif.0=Arial,ANSI_CHARSET
sansserif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.3=Bitstream Cyberbit
monospaced.0=Courier New,ANSI_CHARSET
monospaced.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.3=Bitstream Cyberbit
Java 1.2 includes Unicode fonts. Unfortunately, these fonts do not
support Chinese, Japanese, or Korean yet. For more general information on
fonts and Java, visit this
programmer's page.
Chinese and Swing
Swing components can display any Unicode character that you have the
font for. Here is a sample program that reads a GB file and displays
it in a JTextArea.
import java.lang.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import java.util.*;
import javax.swing.*;
public class swingsample extends JFrame {
private static JTextArea mTextArea;
public swingsample(String filename) {
super("GB File Viewer");
createUI();
try {
loadfile(filename, "GB2312"); // or "BIG5"
}
catch (Exception loadexc) {
}
setVisible(true);
}
public static void loadfile(String filename, String enc)
throws IOException, UnsupportedEncodingException
{
String newline;
String buffer;
InputStream in;
newline = System.getProperty("line.separator");
in = new FileInputStream(filename);
// Set up character stream
BufferedReader r = new BufferedReader(new InputStreamReader(in, enc));
while ((buffer = r.readLine()) != null) {
mTextArea.append(buffer + newline);
}
r.close();
}
protected void createUI() {
setSize(500, 500);
Container content = getContentPane();
content.setLayout(new BorderLayout());
mTextArea = new JTextArea();
//mTextArea.setFont(new Font("Bitstream Cyberbit", Font.PLAIN, 12));
JScrollPane scrollPane = new JScrollPane(mTextArea,
JScrollPane.VERTICAL_SCROLLBAR_ALWAYS,
JScrollPane.HORIZONTAL_SCROLLBAR_ALWAYS);
content.add(scrollPane, BorderLayout.CENTER);
// Exit the application when the window is closed.
addWindowListener(new WindowAdapter() {
public void windowClosing(WindowEvent e) {
System.exit(0);
}
});
}
public static void main(String[] args) {
new swingsample(args[0]);
}
} // swingsample