finished with title compression stuff for now (maybe forever)

author: B. Watson <yalhcru@gmail.com> 2016-01-06 05:26:54 -0500
committer: B. Watson <yalhcru@gmail.com> 2016-01-06 05:26:54 -0500
commit: 1b5919c1c68a379cdc20a56dcb6dfa650f3eac87 (patch)
tree: 967372ddddbe147d4d5def77d96b907d1958ad19 /titlecompression.txt
parent: c51d8df1aec40fc3e74a1c50995fce967689b865 (diff)
download: taipan-1b5919c1c68a379cdc20a56dcb6dfa650f3eac87.tar.gz
1 files changed, 45 insertions, 7 deletions
diff --git a/titlecompression.txt b/titlecompression.txt
index ecbf9df..5f6bf3f 100644
--- a/titlecompression.txt
+++ b/titlecompression.txt
@@ -1,5 +1,26 @@
 The title screen uses a crude form of compression, which I'll call ZRLE
-(zero-run length encoding).
+(zero-run length encoding). It's a special-purpose compression scheme
+I came up with, to meet the following requirements:
+
+- Must be able to compress the Taipan title screen by at least 33%
+  (66% compression ratio, where the decompress code is counted as part
+  of the compressed file size).
+
+- Must be able to decompress in less than 250 bytes of 6502 asm
+  code (2 single-density sectors on disk).
+
+- Must decompress title screen in less than 1/4 sec on the Atari.
+
+All 3 requirements are exceeded slightly: the screen is compressed at a
+60% ratio, the decompressor is around 160 bytes of object code, and it
+runs in approximately 1/5 of a second.
+
+Things that are NOT requirements: it doesn't need to compress anything
+else well, just the title screen. It turns out that it works pretty well
+for bitmapped graphics like the Atari uses, or any kind of file that's
+likely to have large areas of all 0 bytes, but e.g. it can't compress an
+ASCII or ATASCII text file at all (because there are pretty much never
+any null bytes in a human-readable text file).
 
 Theory of operation:
 --------------------
@@ -11,7 +32,11 @@ won't work on arbitrary input data. Why?
 ZRLE relies on some byte values being unused in the input. In other words,
 if the file contains every byte value 0 to 255, it can't be compressed
 with ZRLE. There needs to be at least one unused value per length of
-null run found in the file.
+null run found in the file, because the unused values are used as markers
+telling the decoder how long each run of 0-bytes is.
+
+There is no "escape" byte in ZRLE, or any sort of block structure. Each
+byte is either a byte of pixels, or a marker indicating a run of 0-bytes.
 
 In other words, if the input looks like:
 
@@ -51,7 +76,7 @@ table is written to the file comptitle.s, along with the rest of the
 decoder code from comptitle.s.in. This means the decoder is specific to
 the encoded file, not general-purpose. If we weren't on such a limited
 system as the Atari 800, the table would be part of the output file,
-not the code (like the "dictionary" in a deflated file, etc).
+not the code (like the "dictionary" in a gzipped file, etc).
 
 Each byte value in the compressed data either represents itself (is a
 plain data value) or some number of consecutive zero bytes (is a code
@@ -88,10 +113,10 @@ copied from compressed input to decompressed output as-is)... the value
 05 represents a run of 2 consecutive zeroes, and 06 represents a run
 of 3 zeroes.
 
-The decoder will look at the input, one byte at a time, and consult
-the table to decide what to do with each byte. "Output" here is the
-decompressed data, which you can think of as a list of bytes, which
-starts out empty.
+The decoder will look at the input (the compressed data), one byte at a
+time, and consult the table to decide what to do with each byte. "Output"
+here is the decompressed data, which you can think of as a list of bytes,
+which starts out empty.
 
 input| table lookup |                 |
 data | result       | action          | output after action is taken
@@ -117,6 +142,19 @@ In this dumb example, it's pretty obvious that the encoded data plus
 the table size is bigger than the original input, so the "compression"
 is actually making it bigger. But it serves to illustrate, I hope.
 
+The table lookup values are stored as one byte, so a run of up to 255
+consecutive zeroes can be stored as a single byte. A run of more than
+that could be stored as 2 or more runs in a row, but the current encoder
+just aborts if it encounters a run of 256 zeroes (Taipan doesn't need
+it so I didn't implement it).
+
+The astute reader will have noticed a that a table lookup result of 1,
+meaning a zero-run of length 1, is possible to encode, but useless
+(might as well just store the single zero as plain data), and including
+an entry in the table for 1 makes the table larger by one entry for
+no purpose. The actual encoder (see below) doesn't encode zero-runs
+of length 1, it just stores a 0 as-is.
+
 For Taipan, the title screen (newtitle.png) contains large black areas.
 A black pixel is a 0 bit. The title screen data ends up packed 8 pixels
 per byte (1 bit per pixel), so those black areas are represented by runs
author	B. Watson <yalhcru@gmail.com>	2016-01-06 05:26:54 -0500
committer	B. Watson <yalhcru@gmail.com>	2016-01-06 05:26:54 -0500
commit	1b5919c1c68a379cdc20a56dcb6dfa650f3eac87 (patch)
tree	967372ddddbe147d4d5def77d96b907d1958ad19 /titlecompression.txt
parent	c51d8df1aec40fc3e74a1c50995fce967689b865 (diff)
download	taipan-1b5919c1c68a379cdc20a56dcb6dfa650f3eac87.tar.gz