ALF Compressed Data Stream -------------------------- This describes the compressed data for each file in an ALF archive. Each archive member has a 29-byte header (described in fileformat.txt), followed by its compressed data. The compression algorithm is an implementation of LZW. See: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch ALF uses a variable bit-width encoding, 9 to 12 bits per code. Codes are numbered 0 to 4095 (aka 2^12-1). Each code represents either an uncompressed data byte, a control code, or a multibyte string made from previously-seen input, thus: Code Purpose ------------+---------------------------------------------- 0 to 255 | 1-byte string (predefined) 256 | Reset (clear token table, set bit size to 9) 257 | End of compressed data 258 to 4095 | Multibyte strings ------------+---------------------------------------------- Codes 0 to 255 are mapped to their values (e.g. code 65 represents the byte 65 in the input). As the encoder reads the input, it builds a dictionary of codes. Initially, codes are stored in the output as 9 bits each. When the number of codes exceeds 511 (the maximum number that can be stored in 9 bits), the code bit size is increased to 10. Likewise, when the code count exceeds a 10-bit value (code 1023), the bit size becomes 11, then after code 2047 it becomes 12. When all 12-bit code numbers are used (code 4095), the encoder writes a reset/clear code, clears the dictionary, and resets the bit size to 9. Notice that, other than the change from 12 back to 9, there is no explicit "next bit size" code in the file. The encoder and decoder must agree upon when the bit size increases. When the bit size changes, the last code is written at the current bit size (including the reset/clear code), and the following code is at the new bit size. This means ALF doesn't use an "early change" variant of LZW. Bits are stored to the file in MSB-first order. If the compressed data doesn't end on a byte boundary (has a number of bits not divisible by 8), the least significant bits of the last byte are filled with 0 bits. ALF compressed streams always begin with a reset code (256), *except* for empty files (length of 0 bytes). The stream always ends with an end code (257). This means a 0-byte file consists of one token (the end code), 9 bits long, which occupies 2 bytes. A 1-byte file compresses to a reset code, the one byte, and an end code (9 bits each, 27 bits total, or 4 bytes).