Readers and Writers
Many programmers have a bad habit of writing code as if all text were ASCII or at least in the native encoding of the platform. While some older, simpler network protocols, such as daytime, quote of the day, and chargen, do specify ASCII encoding for text, this is not true of HTTP and many other more modern protocols, which allow a wide variety of localized encodings, such as K0I8-R Cyrillic, Big-5 Chinese, and ISO 8859-2 for most Central European languages. Java's native character set is the UTF-16 encoding of Unicode. When the encoding is no longer ASCII, the assumption that bytes and chars are essentially the same things also breaks down. Consequently, Java provides an almost complete mirror of the input and output stream class hierarchy designed for working with characters instead of bytes. In this mirror image hierarchy, two abstract superclasses define the basic API for reading and writing characters. The java.io.Reader
class specifies the API by which characters are read. The java.io.Writer
class specifies the API by which characters are written. Wherever input and output streams use bytes, readers and writers use Unicode characters. Concrete subclasses of Reader
and Writer
allow particular sources to be read and targets to be written. Filter readers and writers can be attached to other readers and writers to provide additional services or interfaces. The most important concrete subclasses of Reader
and Writer
are the InputStreamReader
and the OutputStreamWriter
classes. An InputStreamReader
contains an underlying input stream from which it reads raw bytes. It translates these bytes into Unicode characters according to a specified encoding. An OutputStreamWriter
receives Unicode characters from a running program. It then translates those characters into bytes using a specified encoding and writes the bytes onto an underlying output stream. In addition to these two classes, the java.io
package provides several raw reader and writer classes that read characters without directly requiring an underlying input stream, including:
FileReader
FileWriter
StringReader
StringWriter
CharArrayReader
CharArrayWriter
The first two classes in this list work with files and the last four work inside Java, so they aren't of great use for network programming. However, aside from different constructors, these classes have pretty much the same public interface as all other reader and writer classes.
The The The same task can be accomplished with these other methods, as well:
All of these examples are different ways of expressing the same thing. Which you use in any given situation is mostly a matter of convenience and taste. However, how many and which bytes are written by these lines depends on the encoding On the other hand, if If Other encodings may write still different sequences of bytes. The exact output depends on the encoding. Writers may be buffered, either directly by being chained to a The After a writer has been closed, further writes throw Valid encodings are listed in the documentation for Sun's native2ascii tool included with the JDK and available from http://java.oracle.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html. If no encoding is specified, the default encoding for the platform is used. (In the United States, the default encoding is ISO Latin-1 on Solaris and Windows, MacRoman on the Mac.) For example, this code fragment writes the string Other than the constructors, The The If no encoding is specified, the default encoding for the platform is used. If an unknown encoding is specified, then an The The For example, the earlier All that was needed to buffer this method was one additional line of code. None of the rest of the algorithm had to change, since the only This method is supposed to replace the deprecated The This method inserts a platform-dependent line-separator string into the output. The By default, the first line number is 0. However, the number of the current line and all subsequent lines can be changed with the This method adjusts only the line numbers that Since The The first Trying to unread more characters than the buffer will hold throws an The Most of these methods behave the same for This Class actually extends This chapter has been a whirlwind tour of the Writers
Writer
class mirrors the java.io.OutputStream
class. It's abstract and has two protected constructors. Like OutputStream
, the Writer
class is never used directly; instead, it is used polymorphically, through one of its subclasses. It has five write()
methods as well as a flush( )
and a close( )
method:
protected Writer( )
protected Writer(Object lock)
public abstract void write(char[] text, int offset, int length) throws IOException public void write(int c) throws IOException public void write(char[] text) throws IOException public void write(String s) throws IOException public void write(String s, int offset, int length) throws IOException public abstract void flush( ) throws IOException public abstract void close( ) throws IOException
write(char[]
text
, int
offset
, int
length)
method is the base method in terms of which the other four write( )
methods are implemented. A subclass must override at least this method as well as flush( )
and close()
, although most override some of the other write( )
methods as well in order to provide more efficient implementations. For example, given a Writer
object w
, you can write the string "Network" like this:
char[] network = {'N', 'e', 't', 'w', 'o', 'r', 'k'};
w.write(network, 0, network.length);
w.write(network);
for (int i = 0; i < network.length; i++) w.write(network[i]);
w.write("Network");
w.write("Network", 0, 7);
w
uses. If it's using big-endian UTF-16, it will write these 14 bytes (shown here in hexadecimal) in this order:
4E 00 65 00 74 00 77 00 6F 00 72 00 6B
w
uses little-endian UTF-16, this sequence of 14 bytes is written:
4E 00 65 00 74 00 77 00 6F 00 72 00 6B 00
w
uses Latin-1, UTF-8, or MacRoman, this sequence of seven bytes is written:
4E 65 74 77 6F 72 6B
BufferedWriter
or indirectly because their underlying output stream is buffered. To force a write to be committed to the output medium, invoke the flush()
method:
w.flush( );
close( )
method behaves similarly to the close( )
method of OutputStream
. close( )
flushes the writer, then closes the underlying output stream and releases any resources associated with it:
public abstract void close( ) throws IOException
IOException
s.OutputStreamWriter
OutputStreamWriter
is the most important concrete subclass of Writer
. An OutputStreamWriter
receives characters from a Java program. It converts these into bytes according to a specified encoding and writes them onto an underlying output stream. Its constructor specifies the output stream to write to and the encoding to use:
public OutputStreamWriter(OutputStream out, String encoding) throws UnsupportedEncodingException public OutputStreamWriter(OutputStream out)
in the Cp1253 Windows Greek encoding:
OutputStreamWriter w = new OutputStreamWriter(
new FileOutputStream("OdysseyB.txt"), "Cp1253");
w.write("
");
OutputStreamWriter
has only the usual Writer
methods (which are used exactly as they are for any Writer
class) and one method to return the encoding of the object:
public String getEncoding( )
Readers
Reader
class mirrors the java.io.InputStream
class. It's abstract with two protected constructors. Like InputStream
and Writer
, the Reader
class is never used directly, only through one of its subclasses. It has three read()
methods, as well as skip( )
, close( )
, ready( )
, mark( )
, reset( )
, and markSupported( )
methods:
protected Reader( )
protected Reader(Object lock)
public abstract int read(char[] text, int offset, int length) throws IOException public int read( ) throws IOException public int read(char[] text) throws IOException public long skip(long n) throws IOException public boolean ready( )
public boolean markSupported( )
public void mark(int readAheadLimit) throws IOException public void reset( ) throws IOException public abstract void close( ) throws IOException
read(char[]
text
, int
offset
, int
length)
method is the fundamental method through which the other two read( )
methods are implemented. A subclass must override at least this method as well as close( )
, although most will override some of the other read( )
methods as well in order to provide more efficient implementations. Most of these methods are easily understood by analogy with their InputStream
counterparts. The read()
method returns a single Unicode character as an int
with a value from 0 to 65,535 or -1 on end of stream. The read(char[]
text)
method tries to fill the array text
with characters and returns the actual number of characters read or -1 on end of stream. The read(char[]
text
, int
offset
, int
length)
method attempts to read length
characters into the subarray of text
beginning at offset
and continuing for length
characters. It also returns the actual number of characters read or -1 on end of stream. The skip(long
n)
method skips n
characters. The mark( )
and reset( )
methods allow some readers to reset back to a marked position in the character sequence. The markSupported( )
method tells you whether the reader supports marking and resetting. The close( )
method closes the reader and any underlying input stream so that further attempts to read from it throw IOException
s. The exception to the rule of similarity is ready()
, which has the same general purpose as available( )
but not quite the same semantics, even modulo the byte-to-char conversion. Whereas available( )
returns an int
specifying a minimum number of bytes that may be read without blocking, ready( )
only returns a boolean
indicating whether the reader may be read without blocking. The problem is that some character encodings, such as UTF-8, use different numbers of bytes for different characters. Thus, it's hard to tell how many characters are waiting in the network or filesystem buffer without actually reading them out of the buffer. InputStreamReader
is the most important concrete subclass of Reader
. An InputStreamReader
reads bytes from an underlying input stream such as a FileInputStream
or TelnetInputStream
. It converts these into characters according to a specified encoding and returns them. The constructor specifies the input stream to read from and the encoding to use:
public InputStreamReader(InputStream in)
public InputStreamReader(InputStream in, String encoding) throws UnsupportedEncodingException
UnsupportedEncodingException
is thrown. For example, this method reads an input stream and converts it all to one Unicode string using the MacCyrillic encoding:
public static String getMacCyrillicString(InputStream in) throws IOException {
InputStreamReader r = new InputStreamReader(in, "MacCyrillic");
StringBuffer sb = new StringBuffer( );
int c;
while ((c = r.read( )) != -1) sb.append((char) c);
r.close( );
return sb.toString( );
}
Filter Readers and Writers
InputStreamReader
and OutputStreamWriter
classes act as decorators on top of input and output streams that change the interface from a byte-oriented interface to a character-oriented interface. Once this is done, additional character-oriented filters can be layered on top of the reader or writer using the java.io.FilterReader
and java.io.FilterWriter
classes. As with filter streams, there are a variety of subclasses that perform specific filtering, including:
BufferedReader
BufferedWriter
LineNumberReader
PushbackReader
PrintWriter
Buffered readers and writers
BufferedReader
and BufferedWriter
classes are the character-based equivalents of the byte-oriented BufferedInputStream
and BufferedOutputStream
classes. Where BufferedInputStream
and BufferedOutputStream
use an internal array of bytes as a buffer, BufferedReader
and BufferedWriter
use an internal array of chars. When a program reads from a BufferedReader
, text is taken from the buffer rather than directly from the underlying input stream or other text source. When the buffer empties, it is filled again with as much text as possible, even if not all of it is immediately needed, making future reads much faster. When a program writes to a BufferedWriter
, the text is placed in the buffer. The text is moved to the underlying output stream or other target only when the buffer fills up or when the writer is explicitly flushed, which can make writes much faster than would otherwise be the case. BufferedReader
and BufferedWriter
have the usual methods associated with readers and writers, like read( )
, ready( )
, write( )
, and close( )
. They each have two constructors that chain the BufferedReader
or BufferedWriter
to an underlying reader or writer and set the size of the buffer. If the size is not set, the default size of 8,192 characters is used:
public BufferedReader(Reader in, int bufferSize)
public BufferedReader(Reader in)
public BufferedWriter(Writer out)
public BufferedWriter(Writer out, int bufferSize)
getMacCyrillicString( )
example was less than efficient because it read characters one at a time. Since MacCyrillic is a 1-byte character set, it also read bytes one at a time. However, it's straightforward to make it run faster by chaining a BufferedReader
to the InputStreamReader
, like this:
public static String getMacCyrillicString(InputStream in) throws IOException {
Reader r = new InputStreamReader(in, "MacCyrillic");
r = new BufferedReader(r, 1024);
StringBuffer sb = new StringBuffer( );
int c;
while ((c = r.read( )) != -1) sb.append((char) c);
r.close( );
return sb.toString( );
}
InputStreamReader
methods used were the read( )
and close( )
methods declared in the Reader
superclass and shared by all Reader
subclasses, including BufferedReader
. The BufferedReader
class also has a readLine( )
method that reads a single line of text and returns it as a string:
public String readLine( ) throws IOException
readLine()
method in DataInputStream
, and it has mostly the same behavior as that method. The big difference is that by chaining a BufferedReader
to an InputStreamReader
, you can correctly read lines in character sets other than the default encoding for the platform. Unfortunately, this method shares the same bugs as the readLine( )
method in DataInputStream
, discussed earlier in this chapter. That is, readline( )
tends to hang its thread when reading streams where lines end in carriage returns, as is commonly the case when the streams derive from a Macintosh or a Macintosh text file. Consequently, you should scrupulously avoid this method in network programs. It's not all that difficult, however, to write a safe version of this class that correctly implements the readLine( )
method. Example 4-1 is such a SafeBufferedReader
class. It has exactly the same public interface as BufferedReader
; it just has a slightly different private implementation. I'll use this class in future chapters in situations where it's extremely convenient to have a readLine( )
method.
Example 4-1. The SafeBufferedReader class
package com.macfaq.io;
import java.io.*;
public class SafeBufferedReader extends BufferedReader {
public SafeBufferedReader(Reader in) {
this(in, 1024);
}
public SafeBufferedReader(Reader in, int bufferSize) {
super(in, bufferSize);
}
private boolean lookingForLineFeed = false;
public String readLine( ) throws IOException {
StringBuffer sb = new StringBuffer("");
while (true) {
int c = this.read( );
if (c == -1) { // end of stream
if (sb.equals("")) return null;
return sb.toString( );
}
else if (c == '\n') {
if (lookingForLineFeed) {
lookingForLineFeed = false;
continue;
}
else {
return sb.toString( );
}
}
else if (c == '\r') {
lookingForLineFeed = true;
return sb.toString( );
}
else {
lookingForLineFeed = false;
sb.append((char) c);
}
}
}
}
BufferedWriter( )
class adds one new method not included in its superclass, called newLine( )
, also geared toward writing lines:
public void newLine( ) throws IOException
line.separator
system property determines exactly what the string is: probably a linefeed on Unix and Mac OS X, a carriage return on Mac OS 9, and a carriage return/linefeed pair on Windows. Since network protocols generally specify the required line-terminator, you should not use this method for network programming. Instead, explicitly write the line-terminator the protocol requires.
LineNumberReader
LineNumberReader
is a subclass of BufferedReader
that keeps track of the current line number. This can be retrieved at any time with the getLineNumber( )
method:
public int getLineNumber( )
setLineNumber( )
method:
public void setLineNumber(int lineNumber)
getLineNumber( )
reports. It does not change the point at which the stream is read. The LineNumberReader
's readLine( )
method shares the same bug as BufferedReader
and DataInputStream
's, and is not suitable for network programming. However, the line numbers are also tracked if you use only the regular read( )
methods, and these do not share that bug. Besides these methods and the usual Reader
methods, LineNumberReader
has only these two constructors:
public LineNumberReader(Reader in)
public LineNumberReader(Reader in, int bufferSize)
LineNumberReader
is a subclass of BufferedReader
, it has an internal character buffer whose size can be set with the second constructor. The default size is 8,192 characters.
PushbackReader
PushbackReader
class is the mirror image of the PushbackInputStream
class. As usual, the main difference is that it pushes back chars rather than bytes. It provides three unread( )
methods that push characters onto the reader's input buffer:
public void unread(int c) throws IOException public void unread(char[] text) throws IOException public void unread(char[] text, int offset, int length) throws IOException
unread( )
method pushes a single character onto the reader. The second pushes an array of characters. The third pushes the specified subarray of characters, starting with text[offset]
and continuing through text[offset+length-1]
. By default, the size of the pushback buffer is only one character. However, the size can be adjusted in the second constructor:
public PushbackReader(Reader in)
public PushbackReader(Reader in, int bufferSize)
IOException
.
PrintWriter
PrintWriter
class is a replacement for Java 1.0's PrintStream
class that properly handles multibyte character sets and international text. Sun originally planned to deprecate PrintStream
in favor of PrintWriter
but backed off when it realized this step would invalidate too much existing code, especially code that depended on System.out
. Nonetheless, new code should use PrintWriter
instead of PrintStream
. Aside from the constructors, the PrintWriter
class has an almost identical collection of methods to PrintStream
. These include:
public PrintWriter(Writer out)
public PrintWriter(Writer out, boolean autoFlush)
public PrintWriter(OutputStream out)
public PrintWriter(OutputStream out, boolean autoFlush)
public void flush( )
public void close( )
public boolean checkError( )
protected void setError( )
public void write(int c)
public void write(char[] text, int offset, int length)
public void write(char[] text)
public void write(String s, int offset, int length)
public void write(String s)
public void print(boolean b)
public void print(char c)
public void print(int i)
public void print(long l)
public void print(float f)
public void print(double d)
public void print(char[] text)
public void print(String s)
public void print(Object o)
public void println( )
public void println(boolean b)
public void println(char c)
public void println(int i)
public void println(long l)
public void println(float f)
public void println(double d)
public void println(char[] text)
public void println(String s)
public void println(Object o)
PrintWriter
as they do for PrintStream
. The exceptions are the four write( )
methods, which write characters rather than bytes; also, if the underlying writer properly handles character set conversion, so do all the methods of the PrintWriter
. This is an improvement over the noninternationalizable PrintStream
class, but it's still not good enough for network programming. PrintWriter
still has the problems of platform dependency and minimal error reporting that plague PrintStream
. It isn't hard to write a PrintWriter
class that does work for network programming. You simply have to require the programmer to specify a line separator and let the IOException
s fall where they may. Example 4-2 demonstrates. Notice that all the constructors require an explicit line-separator string to be provided.
Example 4-2. SafePrintWriter
/*
* @(#)SafePrintWriter.java 1.0 04/06/28
*
* Placed in the public domain
* No rights reserved.
*/
package com.macfaq.io;
import java.io.*;
/**
* @version 1.1, 2004-06-28
* @author Elliotte Rusty Harold
* @since Java Network Programming, 2nd version
*/
public class SafePrintWriter extends Writer {
protected Writer out;
private boolean autoFlush = false;
private String lineSeparator;
private boolean closed = false;
public SafePrintWriter(Writer out, String lineSeparator) {
this(out, false, lineSeparator);
}
public SafePrintWriter(Writer out, char lineSeparator) {
this(out, false, String.valueOf(lineSeparator));
}
public SafePrintWriter(Writer out, boolean autoFlush, String lineSeparator) {
super(out);
this.out = out;
this.autoFlush = autoFlush;
if (lineSeparator == null) {
throw new NullPointerException("Null line separator");
}
this.lineSeparator = lineSeparator;
}
public SafePrintWriter(OutputStream out, boolean autoFlush, String encoding, String lineSeparator) throws UnsupportedEncodingException {
this(new OutputStreamWriter(out, encoding), autoFlush, lineSeparator);
}
public void flush( ) throws IOException {
synchronized (lock) {
if (closed) throw new IOException("Stream closed");
out.flush( );
}
}
public void close( ) throws IOException {
try {
this.flush( );
}
catch (IOException ex) {
}
synchronized (lock) {
out.close( );
this.closed = true;
}
}
public void write(int c) throws IOException {
synchronized (lock) {
if (closed) throw new IOException("Stream closed");
out.write(c);
} }
public void write(char[] text, int offset, int length) throws IOException {
synchronized (lock) {
if (closed) throw new IOException("Stream closed");
out.write(text, offset, length);
} }
public void write(char[] text) throws IOException {
synchronized (lock) {
if (closed) throw new IOException("Stream closed");
out.write(text, 0, text.length);
}
}
public void write(String s, int offset, int length) throws IOException {
synchronized (lock) {
if (closed) throw new IOException("Stream closed");
out.write(s, offset, length);
}
}
public void print(boolean b) throws IOException {
if (b) this.write("true");
else this.write("false");
}
public void println(boolean b) throws IOException {
if (b) this.write("true");
else this.write("false");
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(char c) throws IOException {
this.write(String.valueOf(c));
}
public void println(char c) throws IOException {
this.write(String.valueOf(c));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(int i) throws IOException {
this.write(String.valueOf(i));
}
public void println(int i) throws IOException {
this.write(String.valueOf(i));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(long l) throws IOException {
this.write(String.valueOf(l));
}
public void println(long l) throws IOException {
this.write(String.valueOf(l));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(float f) throws IOException {
this.write(String.valueOf(f));
}
public void println(float f) throws IOException {
this.write(String.valueOf(f));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(double d) throws IOException {
this.write(String.valueOf(d));
}
public void println(double d) throws IOException {
this.write(String.valueOf(d));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(char[] text) throws IOException {
this.write(text);
}
public void println(char[] text) throws IOException {
this.write(text);
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(String s) throws IOException {
if (s == null) this.write("null");
else this.write(s);
}
public void println(String s) throws IOException {
if (s == null) this.write("null");
else this.write(s);
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void print(Object o) throws IOException {
if (o == null) this.write("null");
else this.write(o.toString( ));
}
public void println(Object o) throws IOException {
if (o == null) this.write("null");
else this.write(o.toString( ));
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
public void println( ) throws IOException {
this.write(lineSeparator);
if (autoFlush) out.flush( );
}
}
Writer
rather than FilterWriter
, unlike PrintWriter
. It could extend FilterWriter
instead; however, this would save only one field and one line of code, since this class needs to override every single method in FilterWriter
(close( )
, flush( )
, and all three write( )
methods). The reason for this is twofold. First, the PrintWriter
class has to be much more careful about synchronization than the FilterWriter
class. Second, some of the classes that may be used as an underlying Writer
for this class, notably CharArrayWriter
, do not implement the proper semantics for close( )
and allow further writes to take place even after the writer is closed. Consequently, programmers have to handle the checks for whether the stream is closed in this class rather than relying on the underlying Writer
out
to do it for them.
java.io
package, covering the bare minimum you need to know to write network programs. For a more detailed and comprehensive look with many more examples, check out my other tutorial in this series, Java I/O (Oracle).