An email interview with Alfred, author of AlfCrunch for the Atari 8-bit. Date: Thu, 20 Nov 2025 12:35:25 -0500 From: Alfred To: B. Watson Subject: Re: UnAlf On 2025-11-20 12:37 a.m., B. Watson wrote: > 1. Was AlfCrunch public domain, shareware, or...? 1. AlfCrunch was public domain, although I never did distribute the source code, and as I recall nobody ever asked me for it. The programmer at ICD, Mike Gustafson did the same as you. He disassembled the DZ and added it to their SpartaDos X along with all the ARC routines so they could handle almost anything. Bob Puff at CSS did the same, so he could add it to his SuperUnArc. He phoned me one night to say his code was faster than mine at decompressing an AlfCrunch file. We had a good laugh about it. > 2. Do you have any old disks (or disk images), or paper > notes, anything that you used or referred to while developing > AlfCrunch? Source would be awesome (if you still have it and are > willing to share it). Even just the original distribution disk would > be good to have. 2. I didn't distribute it on disk that I can recall, it was either the two files posted on my bbs, or perhaps they were Arc'd, I just don't recall now. Probably Arc'd because there was a doc file with it. I've attached the source code for LZ/DZ. This isn't the original which was done with Mac/65, it's broken up to use the Six Forks assembler which I had just started using for a bit around then. > 3. Why not ARC compatible compression? You said you ported a PC > program you had the source to... was it just not a matter of having > the source for ARC? Or did you decide to go for speed rather than > compatibility? 3. I didn't have any source code for ARC and I didn't know what the various compression algorithms were. I vaguely knew about Huffman from work as one of the big software programs used it, but I had no idea how it was implemented. I read the LZW paper but I didn't really understand it then. Everyone hated Walden's ARC because it was so slow and it was bugged, but it was all there was. One day somewhere I ran across some guy's implementation of LZW for the pc, and I thought to try porting it because it had to be faster than ARC. It was in assembler, so I could kind of understand it. I'd seen some of the ARC source but the C code was just gibberish to me. It's why my version is so clunky because I was doing like you, just porting each x86 instruction to its sort of 6502 variant. I couldn't make changes to the code because I didn't understand what it was doing back then. After I released the first version someone called me and said their Arcviewer didn't work on .alf files, so I quick fixed the header to be Arc compatible to the extent you could see what the files were, and that's the 1.4 version. So if you run across a 1.2, it's the same except for the header. I don't think hardly anyone saw 1.2 except for some local people because I released 1.4 so fast. > 4. Did you ever work on AlfCrunch after the 1.4 release? You mention a > couple of possibilities for the next version in your doc file. Did any > of that ever materialize (even if unreleased)? 4. I did some work on a LZ 2.0 but I guess I quit on it, I only have some source for the LZ part. I must have thought 1.4 was good enough and moved on to something else. > 5. Are you OK with me redistributing the decompression code from UnAlf > under the WTFPL license? > > 6. Are you OK with me including your AtariAge handle in my unalf > documentation (man page)? 5 & 6. Sure you can distribute whatever you like and mention me in it. It's not like there's many people around who would remember me, heh. LZW is fairly straightforward (now, lol) but it can be a bit hard to get the idea from just reading code. The way it works is a single token is output that represents a string, hopefully a really long one like: token = $122= "went to the store and bought" string associated with that token. However I think tokens start as 9 bit values, so you actually output 9 bits, not just the 8. So on the compress side, you start with a table of, I think, 8 bit tokens, where the value is 0-$FF, which are every possible input value. If were only doing say ASCII text, you could cut it down to maybe 100 tokens, not sure how many non-printables there are like SOL etc. Anyway, you start reading in bytes of data. You check a byte against the table and if you don't find it, you add the next token value which would be $100 and save that byte to that token's entry. Now that can't happen with the first token, because it has to be one of the starting 256 bytes. If you find the token, then you remember it and continue by reading the next character. Now you're checking the table to see if there's a token for 1st+2nd, which there isn't. So you create a new token, $100, and add the 2 byte string as its value, and you output the token for the first byte. Now the third byte probably doesn't match the first byte, so it'll be the same process. Since there's no string of 3rd+4th, you'll output the token for the third byte, and add a new token that represents those two bytes. Now with a good matching data file, like text, you'll probably see 1st+2nd again. So when it sees that first byte value, it says ok, I have a token for that, so it keeps reading, and it sees the second byte and it goes, I have a token for 1+2 too, so then it reads the third byte and now it goes, ok, I don't have a token for 1+2+3, so it outputs the token for 1+2 and creates a new token and stores the string 1+2+3 as it's value. So this process just goes on until you run out of data. With a good match you'll get longer and longer runs of bytes that match earlier strings, so you can get to the point where one token is the same as 40 characters. That's why LZW is so good. However you run into trouble with something like a GIF or JPG because they're all different, you don't get runs of bytes. Especially not in JPG because it's already stripped out all the runs, which is why JPG files are so small. The decompress is similar, it just works backwards. You start with the same 256 byte table. You read the first token, and it matches, so you output the value (the token is the character initially). Since it matched, what you would normally do is take the string that you output just before this and concatenate the first letter of this string to the last output string and add it to the table as a new token value. Since there is no previous string when you read the first byte, you do nothing. So even if you didn't know what the starting table was, you could rebuild it here, because all the initial tokens will be <$100 because they didn't match anything longer in the beginning of the compression, so eventually you will reconstruct the 256 entry table. You short-circuit that part by starting with the known table. So starting with the second token, you end up creating a bunch of second level entries, which are the initial table value+something. As long as the next token is in the table, you just keep going outputting strings and adding new tokens. Now what happens if you get a token that isn't in the table. This where the math becomes magic, I don't really understand the theory, but it works. You know it had to have just been created by the compressor. So the string that this new token represents has to at least start with the last token to which some new character was added. So you take the last string output and concatenate it's first character to itself. So if the last string was the value "ABCD" you create new token in the table and add "ABCDA" as it's value, and you output the string "ABCDA". And so on. Now you start with 9 bit tokens I think. At some point on the compress side, and on the decompress, when you code to add a new token, it's going to take more than 9 bits, you up the bitsize to 10, which also changes the highest token value you can have, which I think is what the MAXCDE value represents. Because of the limited memory, I think I send the clear code at the end of filling a 12 bit table, and start over with 9. Fortunately on the Atari you don't have giant files, so it doesn't reset too often. There are a couple of special tokens, Clear which when you see it you clear the table, and the End token which tells you that it's the last token of the data. A lot of the code in LZ/DZ is the bit twiddling to concatenate the varying bitsize tokens in memory. I can't do that sort of math in my head, so it's a lot brute force shifting bits left and right to align a new token to the last one in memory. The other thing I didn't understand is I don't think the code actually stores every full string, maybe it does, but at the time I thought the guy was using some scheme whereby he was only storing one character with each new token and it somehow was chained to the previous string. That's about all I can tell you about it.