ZIP? BinHex? UUencode? What’s that?

Question: I was wondering what telecommunication purposes are achieved by using the ZIP format? What alternatives are there to the ZIP format, and which one is the most popular? Also, I am interested in knowing what uuencode is and what telecommunications purposes are achieved by using the uuencode program for DOS. Also, what alternatives there are to this approach to achieve the same purpose?

Answer: ZIP files are the personal computer equivalent of sitting on an overstuffed suitcase. If a suitcase lid won’t close, you can sit on it and all the excess space and air goes out of the contents. All your long underwear gets scrunched up nice and small and then you can do up the clasps on the case.

In the computer world, ZIPs are really files that have been shrunk by taking repetitions out of the binary code. Binary code is a bunch of ones and zeroes that all files are made of. When you ZIP a file, you pull out all of those repetitions and replace them with markers or tokens.

For example, let’s say a file looks like this at a binary level: 11101100111011010111000111

I can shrink its size by popping in a token (i.e., a place-holder) where there is repetition. Let’s call the token “x” and have it replace 111. After ZIPing, the file it will look like this: x01100x011010x000x.

Once the tokens are in place, notice how the file uses fewer characters?

If I replace 00 with “y”—now it looks like this: x011yx011010xy0x. Now, I’ll use “z” to mark x011. Now, the file is really small: z00z010x000x.

This is useful in telecommunications because a smaller file takes less time to transmit over the Internet or any network. So file transfers from one computer to another are quicker, because there’s less data to send.

You’ll find word processor files and images will typically ZIP down to a fraction of their original size. Pictures are particularly crunchable. Think of all the white space in a picture of a snowstorm.

If an entire row of pixels is all white, you can replace it with one token and save enormous amounts of space. Once upon a time there were formats such as LZH, ARC and ARJ among others.

You’ll still see those files occasionally, but they have been replaced by the enormous popularity of the ZIP format. You can use AlphaZIP to deal with these files.

On the Mac side, you’ll see SIT files. SIT files are known as “StuffIt” files. Try StuffIt Standard Edition 7.0.3 (Mac) to deal with most Mac compression formats.

If you want to try a good compression tool, give IZArc a try.

As for UUENCODE, it is a method of converting a file from binary to ASCII or plain text characters. Originally, “uuencode” stood for Unix-to-Unix encode, but it’s become a universal protocol used to transfer files between different platforms such as Unix, Windows, and Macintosh, because they all understand ASCII text. ASCII, incidentally, means American Standard Code for Information Interchange and is pronounced “ask-key”.

ASCII is actually a bunch of numeric codes that represent English characters. Each letter has been assigned a number from 0 to 127. Check out the conversion table at: fls.cll.wayne.edu

But back to UUencoding. It is especially popular for sending e-mail attachments. Nearly all e-mail programs use uuencoding for sending attachments and uudecoding for receiving attachments. Another popular competing binary conversion format to UUENCODE is BinHex.

Many e-mail programs include a both Uuencode and BinHex encoder and decoder for sending and receiving attachments. BinHex is an especially common format for Macintosh files. There’s a good BinHex information page with utilities at: natural-innovations.com.