diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/Arcinfo | 124 | ||||
| -rw-r--r-- | doc/DZ.COM | bin | 3683 -> 0 bytes | |||
| -rw-r--r-- | doc/LZ.COM | bin | 3310 -> 0 bytes | |||
| -rw-r--r-- | doc/LZDZ.zip | bin | 52115 -> 0 bytes | |||
| -rw-r--r-- | doc/alf14.atr | bin | 92176 -> 0 bytes | |||
| -rw-r--r-- | doc/alf14_doc.txt | 266 | ||||
| -rw-r--r-- | doc/fileformat.txt | 80 | ||||
| -rw-r--r-- | doc/interview.txt | 166 | ||||
| -rw-r--r-- | doc/review.txt | 44 |
9 files changed, 0 insertions, 680 deletions
diff --git a/doc/Arcinfo b/doc/Arcinfo deleted file mode 100644 index 6c9d500..0000000 --- a/doc/Arcinfo +++ /dev/null @@ -1,124 +0,0 @@ - -ARC-FILE.INF, created by Keith Petersen, W8SDZ, 21-Sep-86, extracted -from UNARC.INF by Robert A. Freed. - -From: Robert A. Freed -Subject: Technical Information for ARC files -Date: June 24, 1986 - -Note: In the following discussion, UNARC refers to my CP/M-80 program -for extracting files from MSDOS ARCs. The definitions of the ARC file -format are based on MSDOS ARC512.EXE. - -ARCHIVE FILE FORMAT -------------------- - -Component files are stored sequentially within an archive. Each entry -is preceded by a 29-byte header, which contains the directory -information. There is no wasted space between entries. (This is in -contrast to the centralized directory used by Novosielski libraries. -Although random access to subfiles within an archive can be noticeably -slower than with libraries, archives do have the advantage of not -requiring pre-allocation of directory space.) - -Archive entries are normally maintained in sorted name order. The -format of the 29-byte archive header is as follows: - -Byte 1: 1A Hex. - This marks the start of an archive header. If this byte is not found - when expected, UNARC will scan forward in the file (up to 64K bytes) - in an attempt to find it (followed by a valid compression version). - If a valid header is found in this manner, a warning message is - issued and archive file processing continues. Otherwise, the file is - assumed to be an invalid archive and processing is aborted. (This is - compatible with MS-DOS ARC version 5.12). Note that a special - exception is made at the beginning of an archive file, to accomodate - "self-unpacking" archives (see below). - -Byte 2: Compression version, as follows: - - 0 = end of file marker (remaining bytes not present) - 1 = unpacked (obsolete) - 2 = unpacked - 3 = packed - 4 = squeezed (after packing) - 5 = crunched (obsolete) - 6 = crunched (after packing) (obsolete) - 7 = crunched (after packing, using faster hash algorithm) (obsolete) - 8 = crunched (after packing, using dynamic LZW variations) - -Bytes 3-15: ASCII file name, nul-terminated. - -(All of the following numeric values are stored low-byte first.) - -Bytes 16-19: Compressed file size in bytes. - -Bytes 20-21: File date, in 16-bit MS-DOS format: - Bits 15:9 = year - 1980 - Bits 8:5 = month of year - Bits 4:0 = day of month - (All zero means no date.) - -Bytes 22-23: File time, in 16-bit MS-DOS format: - Bits 15:11 = hour (24-hour clock) - Bits 10:5 = minute - Bits 4:0 = second/2 (not displayed by UNARC) - -Bytes 24-25: Cyclic redundancy check (CRC) value (see below). - -Bytes 26-29: Original (uncompressed) file length in bytes. - (This field is not present for version 1 entries, byte 2 = 1. - I.e., in this case the header is only 25 bytes long. Because - version 1 files are uncompressed, the value normally found in - this field may be obtained from bytes 16-19.) - - -SELF-UNPACKING ARCHIVES ------------------------ - -A "self-unpacking" archive is one which can be renamed to a .COM file -and executed as a program. An example of such a file is the MS-DOS -program ARC512.COM, which is a standard archive file preceded by a -three-byte jump instruction. The first entry in this file is a simple -"bootstrap" program in uncompressed form, which loads the subfile -ARC.EXE (also uncompressed) into memory and passes control to it. In -anticipation of a similar scheme for future distribution of UNARC, the -program permits up to three bytes to precede the first header in an -archive file (with no error message). - - -CRC COMPUTATION ---------------- - -Archive files use a 16-bit cyclic redundancy check (CRC) for error -control. The particular CRC polynomial used is x^16 + x^15 + x^2 + 1, -which is commonly known as "CRC-16" and is used in many data -transmission protocols (e.g. DEC DDCMP and IBM BSC), as well as by -most floppy disk controllers. Note that this differs from the CCITT -polynomial (x^16 + x^12 + x^5 + 1), which is used by the XMODEM-CRC -protocol and the public domain CHEK program (although these do not -adhere strictly to the CCITT standard). The MS-DOS ARC program does -perform a mathematically sound and accurate CRC calculation. (We -mention this because it contrasts with some unfortunately popular -public domain programs we have witnessed, which from time immemorial -have based their calculation on an obscure magazine article which -contained a typographical error!) - -Additional note (while we are on the subject of CRC's): The validity -of using a 16-bit CRC for checking an entire file is somewhat -questionable. Many people quote the statistics related to these -functions (e.g. "all two-bit errors, all single burst errors of 16 or -fewer bits, 99.997% of all single 17-bit burst errors, etc."), without -realizing that these claims are valid only if the total number of bits -checked is less than 32767 (which is why they are used in small-packet -data transmission protocols). I.e., for file sizes in excess of about -4K bytes, a 16-bit CRC is not really as good as what is often claimed. -This is not to say that it is bad, but there are more reliable methods -available (e.g. the 32-bit AUTODIN-II polynomial). (End of lecture!) - - Bob Freed - 62 Miller Road - Newton Centre, MA 02159 - Telephone (617) 332-3533 - - diff --git a/doc/DZ.COM b/doc/DZ.COM Binary files differdeleted file mode 100644 index 7a91ef0..0000000 --- a/doc/DZ.COM +++ /dev/null diff --git a/doc/LZ.COM b/doc/LZ.COM Binary files differdeleted file mode 100644 index 40f7f11..0000000 --- a/doc/LZ.COM +++ /dev/null diff --git a/doc/LZDZ.zip b/doc/LZDZ.zip Binary files differdeleted file mode 100644 index be3d6bc..0000000 --- a/doc/LZDZ.zip +++ /dev/null diff --git a/doc/alf14.atr b/doc/alf14.atr Binary files differdeleted file mode 100644 index 1801fc9..0000000 --- a/doc/alf14.atr +++ /dev/null diff --git a/doc/alf14_doc.txt b/doc/alf14_doc.txt deleted file mode 100644 index 832f59d..0000000 --- a/doc/alf14_doc.txt +++ /dev/null @@ -1,266 +0,0 @@ - AlfCrunch Documentation Revised 7/10/88 - ----------------------- - - AlfCrunch is an implementation of the Lempel-Ziv compression - algorithm. Although it produces files that have the same structure as - those produced by the Arc program, the two are not compatible. Arc - cannot uncrunch AlfCrunch files, nor can AlfUnCrunch unarc normal Arc - files. - - The current version of the LZ/DZ files is 1.4. Versions 1.1 through 1.3 - are compatible, but not with 1.0. If you have 1.0, you should discard it - and use 1.4. The reason for this is that 1.0 used the same header as - normal Arc crunch. Because of possible confusion over this, the header - used by AlfCrunch was changed. Since 1.0 had very limited distribution, - this situation should not often arise. For those who wish to be able to - detect the AlfCrunch format, the first two bytes of the file will always - be $1A $0F. - - This version fixes an annoying bug in both v1.2 and 1.3. If you had a -subdirectory entry amongst the filenames you were crunching, LZ would -stop at the subdir entry. Also the stack errors will now cause a proper -exit to Dos rather than re-execution. - - Enhancements to v1.4 are the addition of time/date support. If you -are running under Sparta 3.2, LZ will store the Sparta date/time from each -file into the header. DZ does not use this information, it's just there to -provide a reference point. - - When running either LZ.COM or DZ.COM, Memlo must be under $3000. This - should not normally be a problem unless you have a lot of handlers -installed. - A cartridge may be present, as it only affects the size of the buffer - available to AlfCrunch. Maximum speed will be achieved without a - cartridge being present. - - A final note - ------------ - - Well I think this is about as far as AlfCrunch is going to get for now. I -don't really believe there are any more features to add without modifying the -command line parameters. So this version (1.4) will be the last for -some time to come. Except for bug fixes (few if any I hope) the 1.x line will -not change. I hope to add command line parameters similar to ARC and maybe -add the ARC compression methods to finally resolve the compatibility issue. - - Alfred - Programmer's Aid BBS - (416) 465-4182 - - Running AlfCrunch - ----------------- - - To crunch files, load LZ.COM. The title will be displayed, along - with the version which should be 1.4. You will then be prompted for - the output filename. This may be up to 80 characters long, - including subdirectory names. - - If the output file already exists, it is checked to see if it is an -AlfCrunch file. If the first header is correct, then the new files will be -appended to it. If the header is wrong the program will print an error -message and exit to Dos. If the file is shorter than the header length -(29 bytes), then it is simply opened for normal output, which erases it. - - Next you will be prompted for the input filemask. This is what will - be used to select the files. This may also be up to 80 characters long, - including any subdirectory names. Wildcards are allowed. If selecting - all files, the mask must end in *.* . - - Finally, you have the option of turning the screen off. Selecting - this option will speed up the program by 15-20%. Once selected, you will - not again be prompted for this option. If you do not elect to turn the - screen off, the program will continue to present this prompt until it is - selected. - - The program will then select files using the mask and compress them, - displaying the filenames as it progresses. When it has finished, it will - prompt you for additional input filemasks. You may either enter another - mask or simply press return to exit back to Dos. - - LZ and SpartaDos 3.2 - -------------------- - - If you are using SpartaDos 3.2, you may invoke LZ.COM and specify - the output file and input filemask on the command line. The format is: - - [Dn:]LZ Dn:[path>]filename[.ext] [Dn:[path>]filename[.ext] ] - - The square brackets denote optional parameters which may be omitted. - The first filename is the output file. The second is the input - filemask. If you do not specify the input filemask, the program will - prompt you for it. The program will automatically turn the screen off. - When it is finished it will prompt you for more input filemasks. - - To invoke LZ as part of a batch file, the format is almost identical. - The lines in the batch file would be: - - [Dn:]LZ Dn:[path>]filename[.ext] [Dn:[path>]filename[.ext] ] - Dn:[path>]filename[.ext] <- Additional - Dn:[path>]filename[.ext] input masks - - The program will read each input filemask, compress the files - selected and continue until all the input masks have been used. You will - then be prompted for more input masks. If this is part of a larger batch - file, leave a single return after the last input mask to force LZ to - return control back to the batch file. Example: - - [Dn:]LZ Dn:[path>]filename[.ext] [Dn:[path>]filename[.ext] ] - Dn:[path>]filename[.ext] - Dn:[path>]filename[.ext] - (single return here) - [Dn:]LZ Dn:[path>]filename[.ext] [Dn:[path>]filename[.ext] ] - Dn:[path>]filename[.ext] - Dn:[path>]filename[.ext] - (single return here) - - At the end of this, you will be left at the Dos prompt. Because of - the way i/o redirection is handled, an alternative form is available: - - [Dn:]LZ - Dn:[path>]filename[.ext] <- The output file - Dn:[path>]filename[.ext] <- The input filemask - Y <- Turn the screen off - Dn:[path>]filename[.ext] <- Additional - Dn:[path>]filename[.ext] <- input filemasks - (single return here) - - Notice that the Y was only supplied once. When LZ is run in this - manner, it behaves exactly as if you were pressing the keys yourself. If - you turn the screen off, then you need only enter the Y once. If you - said N, then you would need an N after every input filemask until you - said Y. Example: - - [Dn:]LZ - Dn:[path>]filename[.ext] <- The output file - Dn:[path>]filename[.ext] <- The input filemask - N <- Leave the screen on - Dn:[path>]filename[.ext] <- Additional mask - N <- Leave the screen on - Dn:[path>]filename[.ext] <- Additional mask - Y <- Screen off now - Dn:[path>]filename[.ext] <- Additional masks, but no Y - Dn:[path>]filename[.ext] <- is necessary - (single return here) - - Getting Them Back - ----------------- - - To extract the files from an Alfcrunch file, load DZ.COM The title - will be displayed, along with the version number. - - The first prompt is for the name of the file to uncrunch. This - filename may be up to 80 characters long, including subdirectory names. - Wildcards are not allowed. - - The next prompt is the output directory. This is the directory where - the files will be placed when extracted from the crunch file. If the - directory does not exist, an attempt will be made to create the - directory. This may involve creating a number of subdirectories to get - to the last one, so care should exercised with this feature. If - errors occur during the directory build stage, an error message will be - displayed, and the program will return to DOS. You may specify a wildcard to -only extract certain files or use '*.*' to extract them all. *.* is the default. - - Auto directory creation is only available under SpartaDos. Under - any other Dos, if you specify a subdirectory, you will probably get -a single file with the name of the first pathname. - - Assuming all is well, you again have the option of turning the screen - off while files are being extracted. - - The program will then extract each file and place it in the output - directory specified. If any errors occur, an error message is printed - and the program returns to Dos. When all files have been extracted, you - will be prompted for another input file. You may enter another filename - or press Return to exit to Dos. - - The situation may arise where the crunch file has been corrupted. - This may occur due to errors during download, or failure of the disk on - which the file resides. There are several error messages which are - associated with bit errors. - - Msg: Not An AlfCrunch File! - --------------------------- - If this message is issued before any files were extracted, then - either the first two bytes of the file are corrupt, or else the file was - not created by AlfCrunch. If the message is issued after several files - were extracted, then the file has been damaged somewhere in the last - file extracted. You may also get the message which is described next. - - Msg: File Checksum In Error - --------------------------- - DZ has detected that the checksum calculated for the filename just - extracted does not agree with the checksum in the header block. Either - the header block has been damaged or more likely, the file itself has - been corrupted. If the file is a text file, it may be partially correct. - Object file types should be discarded, as it must be assumed they are - corrupt. - - Msg: Stack Overrun - ------------------ - This is an internal DZ error. The file being processed has been - corrupted, and DZ has exhausted all free memory in attempting to extract - the data. The output file produced is incomplete, corrupt, and should be - discarded. - - Msg: Extra Bytes At Eof, Don't Add To File - ------------------------------------------ - This means that the file has extra data at the end which is not valid. -This may arise from downloading where the last block is padded. Do not add -new files to it with LZ as you will not be able to get them back when you run -DZ again. You will get the 'Not An AlfCrunch File!' message at that time. - - DZ and SpartaDos 3.2 - -------------------- - If you are using SpartaDos 3.2, you may invoke DZ.COM and specify - the input file and output directory on the command line. The format is: - - [Dn:]DZ Dn:[path>]filename[.ext] [Dn:[path>][*.*] - - The square brackets denote optional parameters which may be omiited - if you wish. The first filename is the file to be processed. The second - filename is the directory in which the output files are to be placed. - Remember, if any of the directories in the output path do not exist, an - attempt will be made to create them. Remember, you can use a wildcard to -limit the files or take the default -which is '*.*'. - - The program will automatically turn the screen off, and extract - the files. If any errors occur, the appropriate error message will - be printed and control will return to Dos. - - When DZ is finished with the current input file, it will again prompt - you for another input file. You may continue uncrunching files, or - simply press return to exit back to Dos. - - As part of a batch file, the form for DZ is almost identical to the - LZ form. Accordingly, only brief examples will be shown: - - [Dn:]DZ Dn:[path>]filename[.ext] [Dn:[path>][*.*] - Dn:[path>]filename[.ext] <- Second input file - Dn:[path>][*.*] <- Second output path - Dn:[path>]filename[.ext] <- Third input file - Dn:[path>][*.*] <- Third output path - (single return) <- Return to Dos - - The second format is: - - [Dn:]DZ Dn:[path>]filename[.ext] <- First input file - Dn:[path>][*.*] <- First output path - Dn:[path>]filename[.ext] <- Second input file - Dn:[path>][*.*] <- Second output path - Dn:[path>]filename[.ext] <- Third input file - Dn:[path>][*.*] <- Third output path - (single return) <- Return to Dos - - The third format is: - - [Dn:]DZ - Dn:[path>]filename[.ext] <- First input file - Dn:[path>][*.*] <- First output path - Y <- Screen off - Dn:[path>]filename[.ext] <- Second input file - Dn:[path>][*.*] <- Second output path - Dn:[path>]filename[.ext] <- Third input file - Dn:[path>][*.*] <- Third output path - (single return) <- Exit to Dos diff --git a/doc/fileformat.txt b/doc/fileformat.txt deleted file mode 100644 index 7d87000..0000000 --- a/doc/fileformat.txt +++ /dev/null @@ -1,80 +0,0 @@ -ALF Archive Structure ---------------------- - -An ALF archive is laid out almost exactly like an ARC archive that -only uses compression types 2 or greater: A 29-byte header for each -file, followed by the compressed data, followed by either EOF or the -next file's header. - -See the file Arcinfo for the original ARC file format. For ALF files, -"Byte 2: Compression version" will always be $0F. - -Header structure: - -Offset | Length | Description --------+--------+------------------------------------------------------ -0 | 2 | ALF signature bytes: $1A $0F -2 | 13 | Filename (null-terminated) -15 | 4 | 32-bit compressed size (little-endian) -19 | 2 | File date in MS-DOS format (same as ARC) -21 | 2 | File time in MS-DOS format (same as ARC) -23 | 2 | Checksum (simple additivie, *not* a CRC) -25 | 4 | 32-bit original size (little-endian) --------+--------+------------------------------------------------------ - -The compressed data for the file starts at offset 29. - -The differences are: - -- ALF files use $0F for the 'compression type' (offset 1), whereas - ARC files use compression types 1 through 8. - -- ALF always uses the 29-byte header; ARC uses 29-byte headers for - compression types >= 2, but only 25 bytes for type 1 (stored). - -- The actual compressed data is incompatible with any of the - compression types supported by ARC. Although ALF uses an - implementation of Lempel-Zev, it's not the same implementation - as any of the ones that ARC uses. - -- For ARC, the last file's compressed data is followed by a 0 byte - (in place of the $1A header), to signal "end of archive". For - ALF, there's no data after the last byte of the last compressed - file. - -- Because ALF doesn't use a 0 byte to signal end-of-archive, it's - possible to append two ALF archives together; the result is also - a valid ALF archive... unless there's "junk at EOF" on the first - file. - -- ARC uses CRC-16 for its checksums; ALF just adds the bytes together - and uses the low 16 bits of the result as the checksum. - -- Not really a file format difference, but the dates stored inside - ALF files might be wrong or gibberish, if they were created on - an Atari DOS other than SpartaDOS (or, on SpartaDOS, but without - the R-Time 8 cartridge). - -- ARC and ALF are both limited to 12 character filenames, with a - null terminator. With ALF, any remaining bytes in the field after - the null will be set to $20 (ASCII spaces, *not* more nulls). - -- Atari filenames with no extensions (e.g. "FOO") are stored with - a trailing period (e.g. "FOO.") in the ALF header. Upon extraction, - Atari DOSes will remove the period, so the file will be called - "FOO" again. I'm not sure whether the ARC for the Atari shares this - behaviour, but ARC on MS-DOS or Linux doesn't do this. - -- ALF files are never embedded inside a self-extracting executable, - so the first file's header always starts at the first byte of - the file. - -- ARC and ALF both store the compressed and uncompressed file lengths - as 32-bit unsigned integers... but the Atari can't deal with really - large files. From examining the disassembled code of UNALF14.COM, - it looks like the highest byte isn't even looked at, meaning the - maximum size for a single file is 16MB. I have actually tested the - Atari ALF and UNALF programs with an emulator (and emulated hard - drive) with a file of 200KB in size, and it worked fine. - -Author: B. Watson (urchlay@slackware.uk) diff --git a/doc/interview.txt b/doc/interview.txt deleted file mode 100644 index e7d375e..0000000 --- a/doc/interview.txt +++ /dev/null @@ -1,166 +0,0 @@ -An email interview with Alfred, author of AlfCrunch for the -Atari 8-bit. - -Date: Thu, 20 Nov 2025 12:35:25 -0500 -From: Alfred -To: B. Watson <urchlay@slackware.uk> -Subject: Re: UnAlf - -On 2025-11-20 12:37 a.m., B. Watson wrote: - -> 1. Was AlfCrunch public domain, shareware, or...? - -1. AlfCrunch was public domain, although I never did distribute the -source code, and as I recall nobody ever asked me for it. The programmer -at ICD, Mike Gustafson did the same as you. He disassembled the DZ and -added it to their SpartaDos X along with all the ARC routines so they -could handle almost anything. Bob Puff at CSS did the same, so he could -add it to his SuperUnArc. He phoned me one night to say his code was -faster than mine at decompressing an AlfCrunch file. We had a good laugh -about it. - -> 2. Do you have any old disks (or disk images), or paper -> notes, anything that you used or referred to while developing -> AlfCrunch? Source would be awesome (if you still have it and are -> willing to share it). Even just the original distribution disk would -> be good to have. - -2. I didn't distribute it on disk that I can recall, it was either the -two files posted on my bbs, or perhaps they were Arc'd, I just don't -recall now. Probably Arc'd because there was a doc file with it. - -I've attached the source code for LZ/DZ. This isn't the original which -was done with Mac/65, it's broken up to use the Six Forks assembler -which I had just started using for a bit around then. - -> 3. Why not ARC compatible compression? You said you ported a PC -> program you had the source to... was it just not a matter of having -> the source for ARC? Or did you decide to go for speed rather than -> compatibility? - -3. I didn't have any source code for ARC and I didn't know what the -various compression algorithms were. I vaguely knew about Huffman from -work as one of the big software programs used it, but I had no idea how -it was implemented. I read the LZW paper but I didn't really understand -it then. Everyone hated Walden's ARC because it was so slow and it was -bugged, but it was all there was. One day somewhere I ran across some -guy's implementation of LZW for the pc, and I thought to try porting it -because it had to be faster than ARC. It was in assembler, so I could -kind of understand it. I'd seen some of the ARC source but the C code -was just gibberish to me. It's why my version is so clunky because I was -doing like you, just porting each x86 instruction to its sort of 6502 -variant. I couldn't make changes to the code because I didn't understand -what it was doing back then. - - After I released the first version someone called me and said their -Arcviewer didn't work on .alf files, so I quick fixed the header to be -Arc compatible to the extent you could see what the files were, and -that's the 1.4 version. So if you run across a 1.2, it's the same except -for the header. I don't think hardly anyone saw 1.2 except for some -local people because I released 1.4 so fast. - -> 4. Did you ever work on AlfCrunch after the 1.4 release? You mention a -> couple of possibilities for the next version in your doc file. Did any -> of that ever materialize (even if unreleased)? - -4. I did some work on a LZ 2.0 but I guess I quit on it, I only have -some source for the LZ part. I must have thought 1.4 was good enough and -moved on to something else. - -> 5. Are you OK with me redistributing the decompression code from UnAlf -> under the WTFPL license? -> -> 6. Are you OK with me including your AtariAge handle in my unalf -> documentation (man page)? - -5 & 6. Sure you can distribute whatever you like and mention me in it. -It's not like there's many people around who would remember me, heh. - -LZW is fairly straightforward (now, lol) but it can be a bit hard to get -the idea from just reading code. The way it works is a single token is -output that represents a string, hopefully a really long one like: - -token = $122= "went to the store and bought" string associated with that -token. However I think tokens start as 9 bit values, so you actually -output 9 bits, not just the 8. - -So on the compress side, you start with a table of, I think, 8 bit -tokens, where the value is 0-$FF, which are every possible input value. -If were only doing say ASCII text, you could cut it down to maybe 100 -tokens, not sure how many non-printables there are like SOL etc. - -Anyway, you start reading in bytes of data. You check a byte against the -table and if you don't find it, you add the next token value which would -be $100 and save that byte to that token's entry. Now that can't happen -with the first token, because it has to be one of the starting 256 -bytes. If you find the token, then you remember it and continue by -reading the next character. Now you're checking the table to see if -there's a token for 1st+2nd, which there isn't. So you create a new -token, $100, and add the 2 byte string as its value, and you output the -token for the first byte. Now the third byte probably doesn't match the -first byte, so it'll be the same process. Since there's no string of -3rd+4th, you'll output the token for the third byte, and add a new token -that represents those two bytes. Now with a good matching data file, -like text, you'll probably see 1st+2nd again. So when it sees that first -byte value, it says ok, I have a token for that, so it keeps reading, -and it sees the second byte and it goes, I have a token for 1+2 too, so -then it reads the third byte and now it goes, ok, I don't have a token -for 1+2+3, so it outputs the token for 1+2 and creates a new token and -stores the string 1+2+3 as it's value. - So this process just goes on until you run out of data. With a good -match you'll get longer and longer runs of bytes that match earlier -strings, so you can get to the point where one token is the same as 40 -characters. That's why LZW is so good. However you run into trouble with -something like a GIF or JPG because they're all different, you don't get -runs of bytes. Especially not in JPG because it's already stripped out -all the runs, which is why JPG files are so small. - -The decompress is similar, it just works backwards. You start with the -same 256 byte table. You read the first token, and it matches, so you -output the value (the token is the character initially). Since it -matched, what you would normally do is take the string that you output -just before this and concatenate the first letter of this string to the -last output string and add it to the table as a new token value. Since -there is no previous string when you read the first byte, you do -nothing. So even if you didn't know what the starting table was, you -could rebuild it here, because all the initial tokens will be <$100 -because they didn't match anything longer in the beginning of the -compression, so eventually you will reconstruct the 256 entry table. You -short-circuit that part by starting with the known table. - -So starting with the second token, you end up creating a bunch of second -level entries, which are the initial table value+something. As long as -the next token is in the table, you just keep going outputting strings -and adding new tokens. Now what happens if you get a token that isn't in -the table. This where the math becomes magic, I don't really understand -the theory, but it works. You know it had to have just been created by -the compressor. So the string that this new token represents has to at -least start with the last token to which some new character was added. -So you take the last string output and concatenate it's first character -to itself. So if the last string was the value "ABCD" you create new -token in the table and add "ABCDA" as it's value, and you output the -string "ABCDA". And so on. - -Now you start with 9 bit tokens I think. At some point on the compress -side, and on the decompress, when you code to add a new token, it's -going to take more than 9 bits, you up the bitsize to 10, which also -changes the highest token value you can have, which I think is what the -MAXCDE value represents. Because of the limited memory, I think I send -the clear code at the end of filling a 12 bit table, and start over with -9. Fortunately on the Atari you don't have giant files, so it doesn't -reset too often. - - There are a couple of special tokens, Clear which when you see it you -clear the table, and the End token which tells you that it's the last -token of the data. - -A lot of the code in LZ/DZ is the bit twiddling to concatenate the -varying bitsize tokens in memory. I can't do that sort of math in my -head, so it's a lot brute force shifting bits left and right to align a -new token to the last one in memory. The other thing I didn't understand -is I don't think the code actually stores every full string, maybe it -does, but at the time I thought the guy was using some scheme whereby he -was only storing one character with each new token and it somehow was -chained to the previous string. - -That's about all I can tell you about it. diff --git a/doc/review.txt b/doc/review.txt deleted file mode 100644 index f56e4c3..0000000 --- a/doc/review.txt +++ /dev/null @@ -1,44 +0,0 @@ -The following review was published in the Atari H.A.C.K. magazine, -in the August 1988 issue (Volume II, Issue IIX) [1]: - ---------------------------------------------------------------------- -Those of us who are experienced telecommunicators are quite familiar -with the ARC family of disk file compression programs. The most -widely used of the 8-bit versions of the ARC program has been, -and remains to be, ARC version 1.2 (the archiver) and ARCX version -1.2 (the dearchiver). Two very excellent programs written in C by -Ralph Walden of the Atari Computer Enthusiasts of Eugene, aka ACE. -Almost every BBS worth its salt uses this program to compress its -files not only to make them take up less space, but also to save time -on file transfers. A smaller program simply takes less time to send -or receive. Of course, since the file is compressed, or archived, it -isn't runnable until it's dearced with the ARCX program. - -ARC and ARCX are great programs but they have their small problems. -They are slow and sometimes show unexplainable CRC errors when -dearcing. This frustrates and detracts from what is otherwise a great -program. There was none better, that is, until now. - -ALFCRUNCH is here. Despite its cute name it has nothing to do with -the furry wise guy from the planet Melmac. ALFCRUNCH consists of two -programs, LZ.COM, the archiver, and DZ.COM, the dearchiver. Files are -manipulated the same way as the ARC programs do it but they are not -compatible. The LZ program compresses programs slightly more than does -ARC.COM, or anywhere from a few percent to almost 70%, all depending -on file type and save method used. The DZ program works as claimed- -there isn't much to say except that it works. All of this sounds good -but so what? Why change for a few percent? - -The reason to change is speed. ALF programs are at least 10 times -faster than the ARC programs. Sometimes they are even quicker! -Programs which may have taken several minutes to process are done -in seconds with ALF. In fact the first time I tried ALF I thought it -didn't work... but it does! Reason enough to change? Not yet? Well, -ALF is free. Get it from your club PD library or download it from -SLOWPOKE! [2] ---------------------------------------------------------------------- - -[1] The full issue of HACK can be found here: - https://archive.org/details/AtariHACKNewsAugust1988 - -[2] SLOWPOKE was an Atari BBS in the Salem/Portland, Oregon area. |
