Stars Dev Logo
Stars
Home
About
Forum
Email
Links
Products
Pong Ultra
The Terror
Projects
Developer Section
Affiliates
Pong Ultra
Qbasicnews
Ultimate Qb
Phat Code
BasedEdge
Misc
Valid HTML 4.01!
Valid CSS!
Quick Info

You can download the tutorial and the source code here: Click Here to Download.
This is the first version, if any changes occur to the tutorial I will mention it here. Enjoy the tutorial, it took me a long time to write.

The Tutorial

LZW Compression Tutorial Version 1.01


By Martin Zolnieryk aka Hard Rock.
A Stars Dev Company Production (c) 2004
This Version: Aug 5,2004
Email: Hard_rock_2@yahoo.com

ChangeLog


Version 1.01
  • Lots of Spelling/Grammer Error Fixes
Version 1.0
  • First Release
Table of Contents

Introduction

1a. About

Hey, welcome to my first tutorial. I've decided to write this tutorial on compression, specifically LZW and GIF, as the majority of tutorials on this subject I found, were hard to understand, therefore I started this tutorial to hopefully fill that gap. Compression admittedly takes some time to master so I've tried to make things as simple as I could. Once you've learned it however, you'll find it really simple. I've included a little list of references at the end of the tutorial, so hopefully if I missed something, or you're still unclear, you have some places to look. And if anyone finds any errors in this tutorial, email them to me at hard_rock_2@yahoo.com and I will fix them as soon as I can. If you don't find any errors, or enjoyed reading this, then email me anyway since I like feedback. Maybe you'll even motivate me to write more tutorials! Of course source code is included with this tutorial, with sources of C/C++ and Java.

1b.Compression

Now letís start off. There are millions practical uses of compression, and a common one being the method of transferring data over the Internet. For that most Windows users use either .zip or self extracting/installing .exe files both of which are compressed. (These are not all of course; there are hundreds of compression formats). Now that we know how to use it and what it is, we can start!

1c.LZW

The compression format that will be discussed in this tutorial, is Lempel-Ziv-Welch or LZW compression. Rather than fill you with a history of the format we'll get right down to how to use it within your own programs. Let's discuss LZW compression, how does it work? Basically LZW creates tables of strings while compressing/decompressing data, and then later outputs the ID of the string rather than the string itself saving space. Decompression reads this and it to build a table of strings as it reads, saving the compression from needing to write large string tables at the beginning of the file, thus making it very effective. If this confuses you, don't worry we have the whole tutorial to explain it, as well as many link to online references. Well let get started and focus on how it works, worrying about the technicalities later.

Compression

2a.Psuedo Code

Now we will learn how to actually compress data! Wohooo! Are we excited yet? Let's show some simple pseudo code and then we'll dissect it to see how it works:

Pseudo Code

Initialize String Table
Add next char from charstream to String_Buffer
START LOOP here
	get Next_Char from charstream
	if String_Buffer + Next_Char is in string table
		add Next_Char to String_Buffer
	else
		output code for String_Buffer
		add String_Buffer + Next_Char code to table
		clear String_Buffer and equal it to Next_Char
END LOOP when at end of charstream

2b.How it Works

Whoa, that looks sweet, it looks so easy! This is more or less how you'd actually do it, although there are a few minor gripes, but this is for explanation purposes only. Rather then try to explain piece by piece how it works, why not use an example? Let say we have a set of words say:
__________________________
CatCatInTheHatAndTheRat
__________________________
(This makes absolutely no sense I know, but it'll provide a good test bed for our code here, capitalization used to make reader friendly)
Here are the steps are program would take:

  1. C is added to the String_Buffer.
  2. Now it reads Next_Char, Next_Char = 'a', "Ca" and it is not in the string table so we output "C" to the file. "Ca" is given the code of 256 (0-255 are ASCII codes).
  3. String_Buffer is now set to only equal Next_Char, "A"
  4. Now it reads Next_Char, Next_Char = 't', "at" is not in string table so we output "a" to the file. "at" is given the code of 257.
  5. Now it reads Next_Char, Next_Char = 'C', "tC" is not in string table so we output "t" to the file. "tC" is given the code of 258.
  6. Whoa, Wait a second. Now when Next_Char is read, Next_Char = 'a', "Ca" and it is IN the string table! So now String_Buffer = "Ca"
  7. Now it reads Next_Char, Next_Char = 't', "Cat" is not in string table so we output "256" to the file. "Cat" is given the code of 259.
This is basically how it would go on, until the entire file is complete! Cool eh! Now the Results are:
__________________________________________
CatCatInTheHatAndTheRat (uncompressed)
__________________________________________
Cat256TInTheH257And263eR257 (compressed)
__________________________________________
Sweet huh? We dropped, what 4 whole bytes! But realize that because when you compress, you have to use more bits per character to write out everything in order to write a code. Here a minimum of 9 bits (up to code 511) is required so in reality we haven't dropped 4 bytes, using 9 bits we would have dropped a single byte. This type of compression is a lot more effective when you have a larger amount of data, so the overhead goes away and the compression really starts to work! But really, would you normally try compressing a 23 byte files? I think not. Still a little confused? Letís take a look at a table of values:

2c. Table

String_	Next_	Code	Code	Output
Buffer	Char		Value
C	a	256	Ca	C
a	t	257	at	a
t	C	258	tC	t
Ca	t	259	Cat	256
t	I	260	tI	t
I	n	261	In	I
n	t	262	nt	n
T	h	263	Th	T
h	e	264	he	h
e	H	265	eH	e
H	a	266	Ha	H
at	a	267	atA	257
A	n	268	An	A
n	d	269	nd	n
d	T	270	dT	d
Th	e	271	Th	263
e	R	272	eR	e
R	a	273	Ra	R
at	-	---	---	257
You have to only look at the table and see how easy it really is to use this compression, itís not that complicated right? And be sure to check out the source code to see exactly how it works! Right now letís march onto...

Decompression

3a.How it works

Were moving along at blinding speed here! Compression of course, can be pretty useless if you canít use the compressed data, and that's what decompression is for, so we can use that now compressed data, so lets get started. Remember those cool string tables you saw earlier aren't saved, so once again we'll have to generate them as we go. Also there is an exception we have to check for, is if a code is listed but not in the table, this usually happens with highly repetitive data, and is easily remedied. Letís get some pseudo-code and see what we have:

3b. Pseudo Code

Initialize String Table
get First_Code from charstream
output First_Code
START LOOP here
	get Next_Code from charstream
	if Next_Code is NOT in the string table
		String_Buffer = translated First_code + 
    first byte of First_Code
	else
		String_Buffer = Translation of Next_Code
	add translated First_code + first byte of 
First_Code to the table
	First_Code = Next_Code
	output String_Buffer
END LOOP when at end of charstream
I don't think there will be much explanation needed, now that you know how the compression algorithm works, it shouldn't be too difficult to decipher this code. Just remember that the compressor has already done all the work and we need to continue to build the string table both when compressing and decompressing data.

3c.Example

Letís do a brief explanation of what happens when we run Cat256tInTheHatAnd263eR257 through our little decompressor:

  1. 1. C is read and outputted, First_code = 'C'
  2. 2. Next_Code = 'a', 'a' is in table so it is outputted, and "Ca", 256 is added to the table. First_code = 'a'
  3. 3. Next_Code = 't', 't' is in table so it is outputted, and "at", 257 is added to the table. First_Code = 't'
  4. 4. Next_Code = 256, '256' is in table, so "Ca" is outputted, "tC" , 258 is added to string table, and First_code = 256.
The rest should come really easy. And the check if it the code is not in the table is not used in this example, try entering Data like:
AndAndAndAndAndAndAndAndAndAndAndAndAndAndAndAndAndAndAndAnd
"And" see what happens, to keep this tutorial simple we wonít focus on it to much, if you need to learn more about it have a look at the source and other tutorials in the references section. And that concluded the decompression stage, whoa; you now know how to use LZW compression!

The GIF

4a.What is it?

Okay now well go a cover the GIF format. Briefly to summarize what GIF is, GIF stands for the Graphics Interchange Format and is one of the widely accepted methods of transferring images over the net. However it has begun to show its age (supporting only a maximum of 8 bit or 256 colour images) and is gradually being replaced. So we might as well discuss it while it's not extinct like the dinosaurs. Okay, now obviously we can't just run our decompressor over the image and hope for it to work, the GIF format makes some subtle differences to the code were used too. First there is a header, which looks like so:

4b. The GIF Format

GIF HEADER
6 Bytes		Signature
2 Bytes		ScreenWidth
2 Bytes		ScreenHeight
1 Byte		Depth
1 Byte		Background
1 Byte		Reserved(87a) or Pixel Aspect Ratio
Descriptions:

Signature - Is either GIF87a or GIF89a, each version has a few changes but most are not to important, the most noticeable being the fact GIF89a supports animated images
ScreenWidth - The maximum size of an image in a GIF, if this is an animated GIF, images can be smaller or equal to this size
ScreenHeight - See above
Depth - This is divided into several bits, as follows:
-----------------Packed Fields-----------
Bit 0-2: In 87a these hold the value of pixel, which is the number of bits -1 per pixel in the image. In 89a itís the size of the palette ((pixel+1)^2).
Bit 3 - Reserved in 87a, in 89a its a sort flag telling you that colours in the colour table are listed in decreasing order of important, but its not big deal now(unless your running a 256 colours desktop).
Bit 4-6: These are cs, cs+1 is the colour resolution of the image.
Bit 7 - If set we have a global palette, meaning only one palette in total, if not we have a local palette which means you'll find a palette before each image. (If there are multiple images)
-----------------Packed Fields--------------------
Background - The background colour, only important if you have an animated GIF, and some of the "ScreenWidth or Height" is not full.
Reserved/Aspect Ratio: This really isn't important, but uh, if you really need to know more read the GIF89a from wotsit (see references).
Now the Palette, If there is a global palette (Meaning Bit 7 in depth is set) you read this RIGHT after the header, if not after the image identifier, the palette looks like so:
1 Byte - red intensity, 0-255-----| Repeat for every color
1 Byte - green intensity, 0-255 | in colour table, usually
1 Byte - blue intensity, 0-255----| (cs+1)^2
Remember that the colour map starts with colour 0 and works up! Now these values are right shifted twice (many 8 bit palette handlers use 6 bit channels, or 0-63 colours per channel) so you may need to left shift twice or divide by 4(unless youíre dealing with win32api, which wants 0-255 palette entries). However, it really depends on your palette routines so check your documentation on it. Allegro uses 6 bits, and that's why I brought it up. Now after that, you want to look for the Image Descriptor, which looks like so:
IMAGE DESCRIPTER
1 Byte - 	ID
2 Bytes - 	ImageLeft
2 Bytes - 	ImageTop
2 Bytes - 	Width
2 Bytes -	Height
1 Byte - 	Depth
ID - This is the ',' char or 0x2C in hex.
ImageLeft - How many pixels from the left of the Screenwidth the image starts
ImageTop - Same as above except below top
Width - Width of image in pixels
Height - Height of image in pixels
Depth - Okay this is another packed bit, and is rather different in 89a and 87a. Remember if Bit 7 is not set, you only need to worry about Bit 6 as the others are relevant to Bit 7
-----------------Packed Fields--------------------
Bit 0-2: Pixel, in 87a this is bits per pixel-1, in 89a this is the size of the colour table (pixel+1)^2.
Bit 3-4: Reserved both in 87a and 89a
Bit 5: 87a unused, 89a tells that palette is in order of decreasing importance
Bit 6: Image is interlaced, more on interlaced later
Bit 7: if set means has a local palette (animated GIF) meaning the palette starts, then the image otherwise is false meaning image starts right after.
-----------------Packed Fields--------------------
It is very important to realize that the image descriptor does NOT HAVE TO START AFTER THE HEADER, keep reading through the GIF until you find the ',' character, then you read the image, otherwise your loader may experience problems with certain GIFs.
After that there is the image which looks like so
1 Byte	 		CodeSize
1 Byte 			BufferSize -------|Repeat until
BufferSize Bytes	Buffer     -------|EOI is found
Okay, now CodeSize is equal to the bpp +1. This is because the LZW compression of GIF works intelligently. GIF can compress with a maximum of 12 bits per code, however it starts of using only bpp+1 bits per code, and then when the last code used is equal to codeSize^2 then it increases CodeSize by one. This goes on to a maximum of 12 bits, this allows the GIF to compress small images as well as large images.
Now Buffersize is essentially a number from 0-255 that tells how many bytes you're to read and then you repeat it, pretty pointless if you ask me but remember if you read 0 remember to stop reading the image!
Now onto the GIF buffer, this is the same as the lzw compression we just cover with a few differences. They are:
  • a) Variable compression, we already covered this, basically if only 9 bits (any code less then 511) is needed then 9 is used, when code 511 is read, then the compression switches to 10 bits. This goes up until 12, were the maximum is then reached<
  • b) Now the other variation is the clear code, this bpp ^2 (so if the image has 256 colors, or 8 bit, it would be 256). This can be found anywhere in the buffer, and when it is found, you must clear the tables and codes, and start decompressing with clear tables
  • c) The final reserved code is bpp^2+1, which is EOI, when this is found the image is finished and you can stop reading.
Okay now that you've loaded an image, you can keep reading until you find the ';', or 0x3B found after you've loaded an image, if you find a ',' before it(89a), that means it is an animated GIF, and you'll have to load more images! Now that's it, you can load a GIF, Hopefully I've explained it really clearly, if not, here is a little present, more pseudo code!

4c. GIF Pseudo Code

Load Header
Load Image Descriptor after ID has been found
Clear LZW tables
get First_Code
get Code_Size
DO
 get Next_Code
 If Codeword = Clear Code (CC)
  Clear LZW Table
	get FirstCode
Elseif CodeWord = EOI
   BREAK LOOP 
Else
     if Next_Code is NOT in the string table
		String_Buffer = translated First_code + 
    first byte of First_Code
	else
		String_Buffer = Translation of Next_Code
	add translated First_code + first byte of 
First_Code to the table
	First_Code = Next_Code
	display String_Buffer

  If Code_Used = COdesize ^2 -1  AND CodeSize > 12    
     CodeSize = Codesize +1
LOOP  
Make sure to check for the CC(clear code) and EOI(end of image), and the MINUTE your last code used is the CodeSize ^2-1 (So if it was a CodeSize of 9, 511), then increase your CodeSize. The rest is the same as your standard LZW loading, so ill leave actually writing the whole thing an exercise to you, however, you may look at the sources that come with the tutorial, ported to a couple of languages so take a look at which ever one you prefer!

4d.Advanced GIF topics

Okay there are a few things of the GIF I have not covered yet: First: Interlaced GIFs, as promised, here is the info I was holding out: Interlaced GIFs are similar to normal GIFs, except they are designed to be able to display parts of the image before the image has been downloaded. They do that by encoding not top to bottom, but every 8th line, starting with line 0, then every 8th row, starting with row 4, then every 4th row starting with 2 and then finally, you draw every 2nd line, starting with line 1. Confused well, here is a quick drawing:

Row Number          Drawn on pass
0			1
1			   4
2			  3
3			   4
4			 2
5			   4
6			  3
7			   4
8			1
9			   4
10			  3
11			   4
12			 2
13			   4
14			  3
15			   4
16			1
This way, an image is gradually formed, rather then being drawn top to bottom, it made sense when 56k modems were common, but now it doesn't really have many benefits. Nothing else is different, the only difference is the way the pixels are drawn and compressed.
GIF 89a Extensions
GIF 89a adds many new features to the GIF format, some of which have already been discussed, and Ill briefly mention the names of the additions and how to read them
  • The GIF 89a additions begin with the symbol 0x21; this is found before an image descriptor. After the identifier, another identifier is used to determine which extension is used. The extensions are, with a brief summary:
  • Graphics Control Extension - used mainly for transparency, as well as having a few other flags. Extension identifier is 0xF9.
  • Comment Extension - adds user comments to an image, extension identifier is 0xFE.
  • Plain text extension - Used to draw the image in text. Really itís not terribly important, identifier is 0x01.
  • Application Extension - I'm not sure what use this has, identifier is 0xFF. (It doesn't help load the image so don't worry about it).
For more information on these extensions please consult the official GIF 89a documentation, it can be found in the references section.

Source Code
5a. The Source Code contains 2 ports

The C code is 100% C, and works in both MSVC and Mingw, and I'm positive will work in any ANSII c compatible compiler (if not tell me which one and ill see what I can do). The GIF Loading section requires Allegro, I used version 4 but older and newer versions (within reason) should be compatible (I didn't exactly push it to the limits). They were not tested under LINUX, however GifAllegS is uses the same code base and it works under Linux, so this code should too, hopefully.

The Java code was written for the JVM 1.4.2, but likewise I didn't do anything special, older/newer versions within reason should be compatible. Because Java has built in support for Gifs, I did not port the GIF code over, but you are welcome to do so.

Both ports were written for clarity, and I tried my best to write legible, clear code. The code is not optimized for speed and I wouldn't be surprised if the speed can be easily increased (You should have no trouble speeding things up). Global variables in C were avoided, in fact, none are used. Neither are static variables. Code is meant to be modified, feel free to do so, be sure to give credit, the code is freeware, but on the understanding that credit will be given where it is due. Be sure to thank some people from the reference section if you have room as well, I would never have been able to write this without them.

I didn't do a C++ port, although I admit it would be really cool to use constructors and other stuff, but the Java code is close enough, and easily ported. And besides, the C code works well enough in C++. Though I won't hide the fact that I think the C++ version would come out really nice and be totally awesome to use, especially with the ability for vector tables and other stuff... the Java code is a port, and I didnít use any of its really neat features unfortunately.

5b. Notes on Source Code

Okay, first off if you find a bug please report it to me, and better yet, if you know how to fix it send that too, it'll help greatly. Also you can drop by the http://starsdev.cjb.net/developer/ website and see if any new tutorials ever come up. There you can also find my Allegro GIF library.

GIF interlaced is not supported; neither are local colourmaps or animated GIFs. These are easy to write however, and check the references for any more information you may need. Consider it an exercise. If you look at the source code you'll notice that it loads the entire buffer chunk at once, which is faster then reading each byte, but does waste memory. You can read a byte at a time instead if you wanted to, try doing so as an exercise, you'll find it very simple to do! (Although, most computers do have 256 MB of ram)

Also the code only uses a maximum of 16 bit compression and reads and writes bytes at a time. It is very possible to instead use up to 32 bit compression by replacing chars with ints and using a long as a buffer. However for the sake of clarity and keeping everything nice and easy to use I restricted it to a maximum of 16 bit, but feel free to like I said before to do so as an exercise.

5c. C code notes and bit manipulation

Yep another long section, however a very important one. One of the more confusing aspects of the source code is the write and read code routines, which use a lot of bit manipulation routines, and can therefore be very confusing if you are unsure of exactly how the code works.

I am going to assume you know the basics of bit manipulation, if not then I suggest you read Joseph Farrell very informative and well written tutorial, that can be found over at Gamedev Gamedev Bit Tutorial. It will cover all the basics, give you some information on applying them to code, I will simply only go over exactly how the code works.

The only thing I will briefly go over is the << and >> operators.

<< is left shift, if you have the number 10 in binary here is an example:
Number 10, in binary 0000 1010
apply << operator by one, code looks like 10 << 1
Now Number 20, in binary 0001 0100

>> is right shift, if you have the number 10 again:
Number 10, in binary 0000 1010
apply >> operator, code looks like 10 >> 1
Now Number 5, in binary 0000 0101

It is very important to note that if you right shift the most significant bit is lost, and in left shifting the least significant is lost, this is important because you can use it to your advantage.

1010 0000 << 1 = 0100 0000
Notice how the 1 has disappeared? The 1 has been lost because it cannot be placed anywhere.
The second thing Iíd like to emphasize before we start on dissecting the code is that you have to be very careful with signed data types. Once the signed bit is set, you'll find that the values you'll and how it reacts to bitwise operators change, and can affect your code!
If you take a look at the gif_get_lzw code youíll notice it differs from the version that was used. Why? Let try to use the one we found in the lzw.c/h code! So if you implement it, you'll find it doesnít work, why? Well letís draw up a quick table, along with code needed to implement :

LZW Code, not working correctly

int get_gif_lzw_code(FILE *fp,LZW *table, GIF *gif)
{
unsigned int code,looper;
code = 0;
//Never go over than size - 8or we loose data
while(table->buffer_bits <=((sizeof(int)-sizeof(char))*8))
 {
//No point reading anymore, files done :p
if(feof(fp))
{
     break;
}
   
   table->buffer |= ((unsigned char)fgetc(fp)) <<   
   		 (((sizeof(int)*8  - sizeof(char)*8)- table->buffer_bits));
   table->buffer_bits+= 8; 
   table->to_nbuffer--;
 
   if(table->to_nbuffer <= 0)
     { 
      table->buffer_size = getc(fp);
      table->to_nbuffer = table->buffer_size;
     }
 }
code = table->buffer >> (sizeof(int) * 8 - table->bits);
buffer <<= table->bits;
table->buffer_bits -= table->bits;
return code;
}
In case youíre wondering what the table is for, basically Iíve taken a gif, and taken the first 5 Bytes of the Stream, to demonstrate where in the bit manipulation the new code faults. This is an 8 bit GIF, and the first few bits of the stream; therefore the current codesize in bits is 9.

Tables of Values
------------------------------------------------|
GIF Stream(8 bits)	|Correct Value(9 bits)	|
------------------------------------------------|
Dec. Value|Binary	|Dec. Value|Binary	|
0	  |0000 0000	|256	   |1 0000 0000	|
183	  |1011 0111	|217	   |0 1101 1101	|
9	  |0000 1001	|258	   |1 0000 0010	|
28	  |0001 1100	|259	   |1 0000 0011	|
72	  |0100 1000	|			|
------------------------------------------------|
Table Continued
-------------------------
Received Value(9bits)   |
------------------------|
Dec. Value|Binary	|
1	  |0 0000 0001	|
220	  |0 1101 1100	|
72	  |0 0100 1000	|
452	  |1 1100 0010	|
	  |		|
-------------------------
Letís see just why itís not working correctly. Remember Gif Buffer streams go by bits right to left. Letís take a look at what the code does:
table->buffer |= ((unsigned char)fgetc(fp)) << (((sizeof(int)*8 - sizeof(char)*8)- table->buffer_bits));
table->buffer_bits+= 8;

Basically it adds bits in the using the high bytes, and working its way to the low bytes, so the buffer would look like so in binary after storing the first 4 bytes:
Byte 1	  Byte 2     Byte 3	Byte 4	
|-------|---------|----------|---------|
0000 0000 1011 0111 0000 01001 0001 1100
|---------|----------|-----------|
Code 1	    Code 2	Code 3
Okay, now we can see what happening, basically the values are being stored backwards to what the really should be! Now how do we fix that? Well what would happen if we stored Bytes in the reverse order?
Byte 4	  Byte 3       Byte 2	Byte 1
|--------|----------|---------|--------|
0001 1100 0000 01001 1011 0111 0000 0000
        |---------|----------|---------|
           Code 3      Code 2    Code 1     
Hey it works if we do it like so!And that method is the one used in the source code. Now why donít we also use this method for the other LZW code? Well, first off this routine is not quite as optimized as the other, do you see the extra code = table->buffer << (sizeof(int) * 8 - table->bits);? This is to kick of all the extra bits(remember we talked about this) So as to only get the bits we want. Also you'll notice that the code seems to be written to have a very low overhead, this is pretty pointless today as computers do have more than 1MB of memory, so feel free to modify the code so it loads everything in memory and then compresses for a massive speed increase, and upgrade the code so it can write up to 32 bit code sizes, and reads 16 bits instead of 8, (the code makes it very easy to do so). The code was written like this because I felt it would be easier to understand, not more efficient to use, as this is a tutorial not a Fully Built Ready to rock LZW library :)

5d. Notes on Java Source

Okay, I admit, the Java Code is more of a hack then anything, and it was added as an extra for those who prefer java over C. If something really bothers you about the code or you'd like to improve the code in the tutorial by all means drop me an email and ill see about updating the tutorial. Java is a little different in that it does not directly support unsigned types, which can cause problems as shifting works a little different with negative numbers. I was going to go into detail on why I did certain tricks in the code, but I already did that with the c code, and while searching for sources to use as reference for this section I happened to find a very useful link that will explain it a lot better then I will, and would make a great link in your bookmarks: Java Binary Explanation

References

6a. Bibliography

LZW Data Compression, by Mark Nelson.
This is THE tutorial on the source, without this tutorial, I doubt I would have been able to write the code as well as it came out and have been able to grasp the concept behind it. Well written, and very informative, and although it is showing its age, is still the best source you can find.

LZW, gif decoding by Arturo San Emeterio Campos.
This is a very informative and good read, however there are some really annoying spelling and grammar mistakes(although I've probably made just as many), and the level of writing could be a little higher, but despite its faults it was a helpful resource.

LZW and gif Explained by Steve Blackstock
This tutorial will seem a little cryptic and hard to understand, and its recommended you read the above two tuts before diving into this tutorial, but it does cover its topic well enough, just expect an explanation, not working code.(This is found on a lot of sites, so just google the author if the link doesn't work).

Official GIF documentation 87a/89a at wotsit If you need to know more about GIF, you can go wrong with this handy paper, covers everything you'll ever need to know on the GIF format.

Games++ GIF Decoder by Steven H. Don
Without this written source code I can assure you it would have been harder to have written the GIF code, although it shows its age(don't fread/fwrite structs, its not very safe), it is very clearly written and was a great help.

Allegro GIF Animation Library
A very well written library, and it was also a great help when I was writing the GIF loader

6b. Other references

http://www.datacompression.info/
By Mark Nelson, so you know the site is worth checking out for any compression tutorials you may need

All about GIF 89a
Lots of info on the GIF, Covers those extensions I talked about. GIF specifications can be found here, in plain text format.

6c. The Patent issue

The GIF patent has become somewhat an issue, and a subject of great controversy. Here are some links you may wish to read:
Slashdot
A slashdot posts containing some interesting information on the topc.

Unisys
Unisys, the Patent owner on LZW, their information on the licensing

http://www.cloanto.com/users/mcb/19950127giflzw.html
A very interesting read on the topic, and well worth your time to read!

GNU gif page
Not directly related at GIF use, but has a lot of info on the topic

Burn All GIFs
Written in a very negative manner towards the GIF, but contains a wealth of information on the topic

6d. Libraries/Tools Used

I almost forgot this section, and perhaps left many of you wondering where to find Allegro and some other tools! Well, here you go

Allegro
A Cross Platform Game programming Library, very easy to use and was used for the C/C++ GIF loading code. Precompiled dll's for MSVC can be found at allegro.cc

Java
What you don't know what Java is? Well anyway the SDK at the website lets you write your own software, the JRE lets you run the software compiled with it.

Mingw Dev-C++
Mingw is a free C/C++ compiler for windows, and is really worth using, while Dev-c++ is an IDE for Mingw and also rocks. Mingw has many other IDE's so Google for them if you don't like Dev-C++.

6e. The End

Okay the end of the tutorial! Well, hopefully it wasn't that hard to understand, I tried hard to make it a decent tutorial and easy to understand, and I also tried to make the code as easy to read as possible( lots of comments). If you find any bugs or mistakes in either the code or the tutorial please send them to me at hard_rock_2@yahoo.com Also if you want to translate this tutorial into any other languages, or host the tutorial somewhere, that's great, but be sure to tell me you are doing so and leave the links to my website! Also tell me where you are going to be hosting it so I can send information about possible updates or fixes. You are of course free to link directly to this tutorial http://starsdev.cjb.net/developer/lzw.php,no need to tell me if you are going to do so.

Also don't forget to check out The Stars Development Company Website, and the developer section for any other tutorials I'll write or have written and any of my libraries. That's all folks, happy coding!