A Brief History of zlib
The origins of zlib can be found in the history of Info-ZIP. Info-ZIP
is loosely organized group of programmers who give the following reason
for their existence:
Info-ZIP's purpose is to provide free, portable, high-quality versions
of the Zip and UnZip compressor-archiver utilities that are compatible
with the DOS-based PKZIP by PKWARE, Inc.
These free versions of Zip and UnZip are world class programs, and are
in wide use on platforms ranging from the orphaned Amiga through MS-DOS
PCs up to high powered RISC workstations. But these programs are designed
to be used as command line utilities, not as library routines. People have
found that porting the Info-ZIP source into an application could be a grueling
exercise.
Fortunately for all of us, two of the Info-ZIP gurus took it upon themselves
to solve this problem. Mark Adler and Jean-loup Gailly single-handedly
created zlib, a set of library routines that provide a safe, free,
and unpatented implementation of the deflate compression algorithm.
One of the driving reasons behind zlib's creation was for use as the
compressor for PNG format graphics. After Unisys belatedly began asserting
their patent rights to LZW compression, programmers all over the world
were thrown into a panic over the prospect of paying royalties on their
GIF decoding programs. The PNG standard was created to provide an unencumbered
format for graphics interchange. The zlib version of the deflate algorithm
was embraced by PNG developers, not only because it was free, but it also
compressed better than the original LZW compressor used in GIF files.
zlib turns out to be good for more than graphics developers, however.
The deflate algorithm makes an excellent general purpose compressor, and
as such can be incorporated into all sorts of different software. For example,
I use zlib as the compression engine in Greenleaf's ArchiveLib, a data
compression library that work with ZIP archives. It's performance and compatibility
mean I didn't have to reinvent the wheel, saving precious months of development
time.
zlib's interface
As a library developer, I know that interfaces make or break a library.
Performance issues are important, but if an awkward API makes it impossible
to integrate a library into your program, you've got a problem.
zlib's interface is confined to just a few simple function calls. The
entire state of a given compression or decompression session is encapsulated
in a C structure of type z_stream, whose definition is shown in
Figure 1.
typedef struct z_stream_s {
Bytef *next_in; /* next input byte */
uInt avail_in; /* number of bytes available at next_in */
uLong total_in; /* count of input bytes read so far */
Bytef *next_out; /* next output byte should be put there */
uInt avail_out; /* remaining free space at next_out */
uLong total_out; /* count of bytes output so far */
char *msg; /* last error message, NULL if no error */
struct internal_state *state; /* not visible by applications*/
alloc_func zalloc; /* used to allocate the internal state*/
free_func zfree; /* used to free the internal state */
voidpf opaque; /* private data passed to zalloc and zfree*/
int data_type; /* best guess about the data: ascii or binary*/
uLong adler; /* adler32 value of the uncompressed data */
uLong reserved; /* reserved for future use */
} z_stream;
Figure 1
The z_stream object definition
Using the library to compress or decompress a file or other data object
consists of three main steps:
- Creating a z_stream object.
- Processing input and output, using the z_stream object to communicate
with zlib.
- Destroying the z_stream object.
An overview of the process is shown in Figure 2.
Figure 2
The compression or decompression process
Steps 1 and 3 of the compression process are done using conventional
function calls. The zlib API, documented in header file zlib.h,
prototypes the following functions for initialization and termination of
the compression or decompression process:
-
deflateInit()
-
inflateInit()
-
deflateEnd()
-
inflateEnd()
Step 2 is done via repeated calls to either inflate() or deflate(),
passing the z_stream object as a parameter. The entire state of
the process is contained in that object, so there are no global flags or
variables, which allows the library to be completely reentrant. Storing
the state of the process in a single object also cuts down on the number
of parameters that must be passed to the API functions.
When performing compression or decompression, zlib doesn't perform any
I/O on its own. Instead, it reads data from an input buffer pointer that
you supply in the z_stream object. You simply set up a pointer to the next
block of input data in member next_in, and place the number of available
bytes in the avail_in member. Likewise, zlib writes its output data to
a memory buffer you set up in the next_out member. As it writes output
bytes, zlib decrements the avail_out member until it drops to 0.
Given this interface, Step 2 of the compression process for an input
file and an output file might look something like this:
z_stream z;
char input_buffer[ 1024 ];
char output_buffer[ 1024 ];
FILE *fin;
FILE *fout;
int status;
...
z.avail_in = 0;
z.next_out = output_buffer;
z.avail_out = 1024;
for ( ; ; ) {
if ( z.avail_in == 0 ) {
z.next_in = input_buffer;
z.avail_in = fread( input_buffer, 1, 1024, fin );
}
if ( z.avail_in == 0 )
break;
status = deflate( &z, Z_NO_FLUSH );
int count = 1024 - z.avail_out;
if ( count )
fwrite( output_buffer, 1, count, fout );
z.next_out = output_buffer;
z.avail_out = 1024;
}
Figure 3
The code to implement file compression
This method of handling I/O frees zlib from having to implement system
dependent read and write code, and it insures that you can use the library
to compress any sort of input stream, not just files. It's simply a matter
of replacing the wrapper code shown above with a version customized for
your data stream.
Wrapping it up
zlib's versatility is one of its strengths, but I don't always need
all that flexibility. For example, to perform the simple file compression
task Scott asked about at the start of this article, it would be nice to
just be able to call a single function to compress a file, and another
function to decompress. To make this possible, I created a wrapper class
called zlibEngine.
zlibEngine provides a simple API that automates the compression and
decompression of files and uses virtual functions to let you customize
your user interface to zlib. The class definition is shown in its entirety
in Figure 4. There are two different groups of members that are important
to you in ZlibEngine. The first is the set of functions providing the calling
interface to the engine. The second is the set of functions and data members
used to create a user interface that is active during the compression process.
class ZlibEngine : public z_stream {
public :
ZlibEngine();
int compress( const char *input,
const char *output,
int level = 6 );
int decompress( const char *input,
const char *output );
void set_abort_flag( int i ){ m_AbortFlag = i; }
protected :
int percent();
int load_input();
int flush_output();
protected :
virtual void progress( int percent ){};
virtual void status( char *message ){};
protected :
int m_AbortFlag;
FILE *fin;
FILE *fout;
long length;
int err;
enum { input_length = 4096 };
unsigned char input_buffer[ input_length ];
enum { output_length = 4096 };
unsigned char output_buffer[ output_length ];
};
Figure 4
The ZlibEngine wrapper class
The Calling API
There are three C++ functions that implement the API needed to perform
simple compression and decompression. Before using the engine, you must
call the constructor, the first function. Since ZlibEngine is derived
from the z_stream object used as the interface to zlib, the constructor
is in effect also creating a z_stream object that will be used to
communicate with zlib. In addition, the constructor initializes some of
the z_stream member variables that will be used in either compression
or decompression.
The two remaining functions are nice and simple: compress() compresses
a file using the deflate algorithm. An optional level parameter sets a
compression factor between 9 (maximum compression) and 0 (no compression.)
decompress() decompresses a file, as you would expect. The compression
level parameter isn't necessary when decompressing, due to the nature of
the deflate algorithm. Both of these functions return an integer status
code, defined in the zlib header file zlib.h. Z_OK is returned
when everything works as expected. Note that I added an additional code,
Z_USER_ABORT, used for an end user abort of the compression or decompression
process.
The wrapper class makes it much easier to compress or decompress files
using zlib. You only need to remember three things:
- Include the header file for the wrapper class, zlibengn.h.
- Construct a ZlibEngine object.
- Call the member functions compress() or decompress()
to do the actual work.
This means you can now perform compression with code this simple:
#include <zlibengn.h>
int foo()
{
ZlibEngine engine;
return engine.compress( "INPUT.DAT", "INPUT.DA_");
}
That's about as simple as you could ask for, isn't it?
The User Interface API
The calling API doesn't really make much of a case for creating the
ZlibEngine class. Based on what you've seen so far, the compress()
and decompress() functions don't really need to be members of a
class. In theory, a global compress() function could just instantiate
a z_stream object when called, without the caller even being aware
of it.
The reason for creating this engine class is found in a completely different
area: the user interface. It's really nice to be able to track the progress
of your compression job while it's running. Conventional C libraries have
to make do with callback functions or inflexible standardized routines
in order to provide feedback, but C++ offers a better alternative through
the use of virtual functions.
The ZlibEngine class has two virtual functions that are used
to create a useful user interface: progress() is called periodically
during the compression or decompression process, with a single integer
argument that tells what percentage of the input file has been processed.
status() is called with status messages during processing.
Both of these virtual functions have access to the ZlibEngine
protected data element, m_AbortFlag. Setting this flag to a non-zero
value will cause the compression or decompression routine to abort immediately.
This easily takes care of another sticky user interface problem found when
using library code.
Writing your own user interface then becomes a simple exercise. You
simply derive a new class from ZlibEngine, and define your own versions
of one or both of these virtual functions. Instantiate an object of your
class instead of ZlibEngine, and your user interface can be as spiffy
and responsive as you like!
Command line compression
I wrote a simple command line test program to demonstrate the use of
class ZlibEngine. zlibtest.cpp does a simple compress/decompress
cycle of the input file specified on the command line. I implement a progress
function that simply prints out the current percent towards completion
as the file is processed:
class MyZlibEngine : public ZlibEngine {
public :
void progress( int percent )
{
printf( "%3d%%\b\b\b\b", percent );
if ( kbhit() ) {
getch();
m_AbortFlag = 1;
}
}
};
Since class ZlibEngine is so simple, the derived class doesn't
even have to implement a constructor or destructor. The derived version
of progress() is able to provide user feedback as well as an abort
function with just a few lines of code. zlibtest.cpp is shown in
its entirety in Listing 1.
The OCX
To provide a slightly more complicated test of class ZlibEngine,
I created a 32 bit OCX using Visual C++ 4.1. The interface to an OCX is
defined in terms of methods, events, and properties. ZlibTool.ocx
has the following interface:
Properties:
|
InputFile
|
|
OutputFile
|
|
Level
|
|
Status
|
|
|
Methods:
|
Compress()
|
|
Decompress()
|
|
Abort()
|
|
|
Events:
|
Progress()
|
(Note that I chose to pass status information from the OCX using a property,
not an event.)
ZlibTool.ocx is a control derived from a standard Win32 progress
bar. The progress care gets updated automatically while compressing or
decompressing, so you get some user interface functionality for free. Using
it with Visual Basic 4.0 or Delphi 2.0 becomes a real breeze. After registering
the OCX, you can drop a copy of it onto your form and use it with a minimal
amount of coding.
Both the source code for the OCX and a sample Delphi 2.0 program are
available on the DDJ listing service. A screen shot of the Delphi program
in action is shown in Figure 5.
Figure 5
The Delphi 2.0 OCX test program
Reference material
The source code that accompanies this article can be downloaded from
this Web page. It contains the following source code collections:
- The complete source for zlib
- The Visual C++ 4.1 project for the ZlibTool OCX
- The Delphi 2.0 project that exercises the OCX
- The Console test program that exercises the ZlibEngine class
Each of the subdirectories contains a README.TXT file with documentation
describing how to build and use the programs.
The source is split into two archives:
zlibtool.zip
|
All source code and the OCX file.
|
zlibdll.zip
|
The supporting MFC and VC++ DLLs. Many people will already have
these files on their systems: MFC40.DLL, MSVCRT40.DLL, and OLEPRO32.DLL.
|
I haven't discussed the zlib code itself in this article. The best place
to start gathering information about how to use zlib and the Info-ZIP products
can be found on their home pages. Both pages have links to the most current
versions of their source code as well:
Once you download the Info-ZIP code, the quick start documentation is
found in source file zlib.h. If you cook up any useful code that
uses zlib, you might want to forward copies to Greg
Roelofs for inclusion on the zlib home page. Greg maintains the zlib
pages, and you can reach him via links found there.
Feel-good plug
zlib can do a lot more than just compress files. Its versatile interface
can be used for streaming I/O, in-memory compression, and more. Since Jean-loup
Gailly and Mark Adler were good enough to make this capable tool available
to the public, it only makes sense that we take advantage of it. I know
I have, and I encourage you to do the same.