Data Compression

The java.util.zip package contains classes you can use for data compression. In this section, we'll talk about how to use these classes. We'll also present two useful example programs that build on what you have learned about streams and files. The classes in the java.util.zip package support two widespread compression formats: GZIP and ZIP.

Archives and Compressed Data

The java.util.zip package provides two filter streams for writing compressed data. The GZIPOutputStream is for writing data in GZIP compressed format. The ZIPOutputStream is for writing compressed ZIP archives, which can contain one or many files. To write compressed data in the GZIP format, simply wrap a GZIPOutputStream around an underlying stream and write to it. The following is a complete example that shows how to compress a file using the GZIP format:

 //file: GZip.java
 import java.io.*;
 import java.util.zip.*;
 public class GZip {
 public static int sChunk = 8192;
 public static void main(String[] args) {
 if (args.length != 1) {
 System.out.println("Usage: GZip source");
 return;
 }
 // create output stream
 String zipname = args[0] + ".gz";
 GZIPOutputStream zipout;
 try {
 FileOutputStream out = new FileOutputStream(zipname);
 zipout = new GZIPOutputStream(out);
 }
 catch (IOException e) {
 System.out.println("Couldn't create " + zipname + ".");
 return;
 }
 byte[] buffer = new byte[sChunk];
 // compress the file
 try {
 FileInputStream in = new FileInputStream(args[0]);
 int length;
 while ((length = in.read(buffer, 0, sChunk)) != -1)
 zipout.write(buffer, 0, length);
 in.close( );
 }
 catch (IOException e) {
 System.out.println("Couldn't compress " + args[0] + ".");
 }
 try { zipout.close( ); }
 catch (IOException e) {}
 }
 }


First, we check to make sure we have a command-line argument representing a filename. We then construct a GZIPOutputStream wrapped around a FileOutputStream representing the given filename, with the .gz suffix appended. With this in place, we open the source file. We read chunks of data and write them into the GZIPOutputStream. Finally, we clean up by closing our open streams. Writing data to a ZIP archive file is a little more involved but still quite manageable. While a GZIP file contains only one compressed file, a ZIP file is actually a collection of files, some (or all) of which may be compressed. Each item in the ZIP file is represented by a ZipEntry object. When writing to a ZipOutputStream, you'll need to call putNextEntry( ) before writing the data for each item. The following example shows how to create a ZipOutputStream. You'll notice it's just like creating a GZIPOutputStream:

 ZipOutputStream zipout;
 try {
 FileOutputStream out = new FileOutputStream("archive.zip");
 zipout = new ZipOutputStream(out);
 }
 catch (IOException e) {}


Let's say we have two files we want to write into this archive. Before we begin writing, we need to call putNextEntry( ). We create a simple entry with just a name. You can set other fields in ZipEntry, but most of the time, you won't need to bother with them.

 try {
 ZipEntry entry = new ZipEntry("First");
 zipout.putNextEntry(entry);
 ZipEntry entry = new ZipEntry("Second");
 zipout.putNextEntry(entry);
 . . .
 }
 catch (IOException e) {}


Decompressing Data

To decompress data in the GZIP format, simply wrap a GZIPInputStream around an underlying FileInputStream and read from it. The following is a complete example that shows how to decompress a GZIP file:

 //file: GUnzip.java
 import java.io.*;
 import java.util.zip.*;
 public class GUnzip {
 public static int sChunk = 8192;
 public static void main(String[] args) {
 if (args.length != 1) {
 System.out.println("Usage: GUnzip source");
 return;
 }
 // create input stream
 String zipname, source;
 if (args[0].endsWith(".gz")) {
 zipname = args[0];
 source = args[0].substring(0, args[0].length( ) - 3);
 }
 else {
 zipname = args[0] + ".gz";
 source = args[0];
 }
 GZIPInputStream zipin;
 try {
 FileInputStream in = new FileInputStream(zipname);
 zipin = new GZIPInputStream(in);
 }
 catch (IOException e) {
 System.out.println("Couldn't open " + zipname + ".");
 return;
 }
 byte[] buffer = new byte[sChunk];
 // decompress the file
 try {
 FileOutputStream out = new FileOutputStream(source);
 int length;
 while ((length = zipin.read(buffer, 0, sChunk)) != -1)
 out.write(buffer, 0, length);
 out.close( );
 }
 catch (IOException e) {
 System.out.println("Couldn't decompress " + args[0] + ".");
 }
 try { zipin.close( ); }
 catch (IOException e) {}
 }
 }


First, we check to make sure we have a command-line argument representing a filename. If the argument ends with .gz, we figure out what the filename for the uncompressed file should be. Otherwise, we use the given argument and assume the compressed file has the .gz suffix. Then we construct a GZIPInputStream wrapped around a FileInputStream, representing the compressed file. With this in place, we open the target file. We read chunks of data from the GZIPInputStream and write them into the target file. Finally, we clean up by closing our open streams. Again, the ZIP archive presents a little more complexity than the GZIP file. When reading from a ZipInputStream, you should call getNextEnTRy( ) before reading each item. When getNextEntry( ) returns null, there are no more items to read. The following example shows how to create a ZipInputStream. You'll notice it's just like creating a GZIPInputStream:

 ZipInputStream zipin;
 try {
 FileInputStream in = new FileInputStream("archive.zip");
 zipin = new ZipInputStream(in);
 }
 catch (IOException e) {}


Suppose we want to read two files from this archive. Before we begin reading, we need to call getNextEntry( ). At the very least, the entry gives us a name of the item we are reading from the archive:

 try {
 ZipEntry first = zipin.getNextEntry( );
 }
 catch (IOException e) {}


At this point, you can read the contents of the first item in the archive. When you come to the end of the item, the read( ) method returns -1. Now you can call getNextEntry( ) again to read the second item from the archive:

 try {
 ZipEntry second = zipin.getNextEntry( );
 }
 catch (IOException e) {}


If you call getNextEntry( ) and it returns null, there are no more items and you have reached the end of the archive.

Comments