1 files changed, 53 insertions, 30 deletions
diff --git a/README.txt b/README.txt
index 6c0a188..cb72884 100644
--- a/README.txt
+++ b/README.txt
@@ -19,8 +19,8 @@ AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run.
 The demo shows "Hello World" with changing colors, along with its own
 load address, end address, and the current MEMLO. The important part
 is that it got relocated to MEMLO and run from there. The code isn't
-relocatable (see the souce, "hello.s"). The relocator adjusted all the
-absolute addresses on the fly (at load time).
+relocatable (see the source, "hello.s"). The relocator adjusted all
+the absolute addresses on the fly, at load time.
 
 How it works
 ------------
@@ -44,6 +44,11 @@ run. The relocator presented here gets appended to your standard
 executable and relocates it "on the fly", then jumps to the start of
 the relocated code.
 
+The Insight: Atari article mentions that OSS languages use a scheme
+like this to relocate themselves when loaded. The sources for the
+OSS languages that have been released have a BASIC XL program that
+generates the bitmaps.
+
 Example: a subroutine call to within our own code:
 
  JSR print_banner
@@ -63,27 +68,29 @@ The code that does the relocation, we'll call the relocator. The term
 is the loader, and we're not replacing it.
 
 The relocator is a small routine that gets appended to the first
-executable (the $4000 one) as a segment, plus two data tables (one for
-the original ORG, code length, init, and run addresses, the other with
-the addresses that need adjusting), plus an INITAD segment that runs
-the relocator code. These all have to load at a fixed address, but
-once they're finished running, they won't be needed again.
+executable (the $4000 one) as a segment, plus two data tables. The
+first is 8 bytes, and has the original ORG, code length, init, and run
+address. The other is a bitmap of the addresses in the program, one
+bit per byte in the program. The bit is set to 1 if that address needs
+relocating, or 0 if not.
+
+The tables are followed by an INITAD segment that runs the relocator
+code. The relocator and the tables have to load at a fixed address,
+but once they're finished running, they won't be needed again.
 
-The relocator has to know the load address and the length of the main
-segment of the program (the part it's going to relocate). What it
-does:
+The relocator has to know the load address and the length of
+the "payload" segment of the program (the part it's going to
+relocate). What it does:
 
 1. Subtract the high byte of MEMLO from the high byte of the load address
    ($4000 in the example), then add 1. This gives us a positive number
    (we hope!) that is the amount each address's high byte in the
    program should have subtracted from it.
 
-2. Iterate over the relocation data table, subtracting the
-   offset. Each table entry is the two-byte address of a byte that
-   needs to be changed (an absolute address that's "baked" into the
-   program).
-
-3. Move the main segment to the start of the first page above MEMLO.
+3. Loop over the code to be relocated, copying it to the new
+   address (start of the first page above MEMLO). As each byte is
+   moved, it's also adjusted (has the offset subtracted from it) if
+   its bit in the relocation table is set.
 
 4. Set MEMLO to point to the byte after the end of the program
    to protect it from being overwritten by e.g. BASIC or ASM/ED.
@@ -91,7 +98,7 @@ does:
 5. If the program has an init address, subtract the offset from it,
    then jump to it. This runs the payload program's init routine.
 
-5. If the program has a run address, subtract the offset from it,
+6. If the program has a run address, subtract the offset from it,
    storing the result in RUNAD. Then do an RTS to hand control back
    to DOS. DOS will run the relocated code by jumping to the altered
    RUNAD, in the usual way.
@@ -113,11 +120,11 @@ Notes:
   will still be "in the middle of" loading the executable, meaning
   IOCB #1 will still be open for reading.
 
-- The program's end address must be below $6C00, since that's where
+- The program's end address must be below $71C0, since that's where
   the relocator and tables load. The reason for this restriction
   is to allow the relocatable executable to work with a 16K cartridge.
-  The lowest sane start address for the program is probably $2000,
-  which allows the program to be 19KB in size... though $3000 is
+  The lowest sane start address for the program is probably $2800,
+  which allows the program to be 17KB in size... though $3000 is
   a lot safer (15KB max).
 
 - Whatever start address (ORG) you use for the program, it has to
@@ -128,9 +135,6 @@ Notes:
 
 - Also, the start address has to start on a page boundary ($xx00).
 
-- The data table size must not exceed 4K. The table is compressed; see
-  "Relocation Table Format", below.
-
 - The original Wilkinson scheme was done entirely in Atari BASIC.
   I use a perl script to create the relocation tables and the
   relocator itself becomes part of the relocatable program, so BASIC
@@ -186,7 +190,7 @@ First table is 8 bytes (4 words):
 - Original run address (or 0 for none)
 - Original init address (or 0 for none)
 
-The next N bytes are the high-byte relocation table. See below.
+The next N bytes are the relocation bitmap table. See below.
 
 For the init address, if it's not zero, the relocator JSR's to it (at its
 new location).
@@ -221,14 +225,29 @@ $16 40 - code_end
 $00 00 - code_run (no run address)
 $00 40 - code_init
 
-High byte relocation table:
+Relocation bitmap table, in binary:
 
-TODO: finish and fix.
+table byte:   addresses:
+00100100      $4000 to $4007
+01000000      $4008 to $400F
+10000000      $4010 to $4017
+
+The bits are read left to right. The first 1 bit is for
+address $4002, which is the high byte of the JMP operand.
+
+The last byte of the table actually extends past the end
+of the program. Extra bits in the last byte are set to 0.
+
+The bitmap table is always 1/8 the size of the code, rounded up to
+the next byte. It might be possible someday to save space by letting
+the table end early, if e.g. the last part of the program is fully
+relocatable code (or data). Currently this isn't done, and I'm not
+sure it's worth the extra complexity to implement.
 
 Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator
 will move the program to $1D00 - $1D16 and set MEMLO to $1D17. The
 operand of the first instruction (was JSR $4007) will be altered
-to $1D07 (aka $4007 - $4000 + $1CFC), which is the address that the
+to $1D07 (aka $4007 - $4000 + $1D00), which is the address that the
 subroutine got relocated to.
 
 The original program assembled to a 32-byte file. The relocatable
@@ -253,6 +272,10 @@ If your code is 18880 bytes, the bitmap size is 2360 bytes.
 Supposing you ORG at $2800:
 
 code - $2800 to $71BF
-relocator - $71C0 to $71CF
-8-byte table: $71D0 to $71D7
-bitmap - $71D8 to $7B10
+relocator - $71C0 to $72BF
+8-byte table: $72C0 to $72C7
+bitmap - $72C8 to $7C00
+
+18880 bytes is the maximum size. Actually, the relocator is only 185
+bytes, and the table could extend to $7C1F without overwriting the
+display list.