From 3a120e78480a3b43c5cf32d9e6efae5a698abd38 Mon Sep 17 00:00:00 2001
From: "B. Watson" <urchlay@slackware.uk>
Date: Wed, 23 Apr 2025 04:15:47 -0400
Subject: relocate only high bytes, increments of 1 page. previous approach was
 unworkable.

---
 README.txt | 136 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 67 insertions(+), 69 deletions(-)

(limited to 'README.txt')

diff --git a/README.txt b/README.txt
index 6edc292..07ee72b 100644
--- a/README.txt
+++ b/README.txt
@@ -16,83 +16,74 @@ To build, just type "make". The result is "reloc.atr", which is
 an Atari disk image with DOS 2.0S and the relocatable program as
 AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run.
 
-The demo just shows "Hello World" with changing colors. The important
-part is that it got relocated to MEMLO and run from there. The code
-isn't relocatable (see the souce, "hello.s"). The relocator adjusted
-all the absolute addresses on the fly (at load time).
+The demo shows "Hello World" with changing colors, along with its own
+load address, end address, and the current MEMLO. The important part
+is that it got relocated to MEMLO and run from there. The code isn't
+relocatable (see the souce, "hello.s"). The relocator adjusted all the
+absolute addresses on the fly (at load time).
 
 How it works
 ------------
 
-In the original scheme, you'd assemble the code twice, with the origin
-(start address) one page apart. Say, assemble at address $4000, then
-the 2nd time at $4100. Now, any bytes in the two object files that
-differ by 1, are what needs to be changed when relocating. Suppose you
-want to relocate to $2000, you just subtract $20 from all the bytes in
-the first file that are 1 less than the same byte in the 2nd file.
-
-This works, and is simple enough. The limitation is, you can only
-relocate to an even page boundary. If you want to relocate to the
-bottom of memory (pointed to by MEMLO), you probably will waste a few
-bytes. In DOS 2.0S, I get $1CFC in MEMLO. Relocating to an even page
-boundary means the goes goes at $1D00, and the 4 bytes from $1CFC
-to $1D00 are wasted. That's not so bad... but if I enable another
-drive in DOS, that bumps MEMLO up by 128 bytes, to $1D7C. Then my
-relocatable code ends up at $1E00, and I waste 132 bytes below that...
-
-In the modified form presented here, the code is still assembled
-twice, but the 2nd pass is ORG'ed 258 ($0102) bytes higher than
-the first. Now we have bytes that differ by one (the high bytes of
-addresses) and others that differ by two (the low bytes).
-
-Another, more serious limitation of the code from Insight: Atari is
-that it doesn't produce self-relocating executables. What it produces
-is BASIC programs that have the relocatable object code as DATA
-statements, POKEd into memory when run. The relocator presented here
-gets appended to your standard executable and relocates it "on the
-fly", then jumps to the start of the relocated code.
+You assemble the code twice. The 2nd time around, you set the origin
+one page higher than the first. You have two executables that are
+identical except for the high bytes of absolute addresses within the
+code (which differ by one). Based on this information, the relocator
+can move the code to just above MEMLO and adjust all the addresses so
+it'll actually run in its new location.
+
+Unfortunately, the code can only be relocated by multiples of 256
+bytes. The low bytes aren't adjusted. So unless MEMLO happens to
+contain $FF in its low byte, some memory will be wasted (up to 256
+bytes).
+
+The code from Insight: Atari is doesn't produce self-relocating
+executables. What it produces is BASIC programs that have the
+relocatable object code as DATA statements, POKEd into memory when
+run. The relocator presented here gets appended to your standard
+executable and relocates it "on the fly", then jumps to the start of
+the relocated code.
 
 Example: a subroutine call to within our own code:
 
  JSR print_banner
 
 This is the first instruction in our program, so it will be found
-at $4000 for the first assembly pass, and $4102 for the second.
+at $4000 for the first assembly pass, and $4100 for the second.
 
-Say print_banner ends up at $4123 when we assemble at $4000, and $4225
-when assembling at $4102. Further, we determine MEMLO has $1D80. So,
-when we relocate the program, it ends up at $1D80. The target of the
-JSR instruction has to be adjusted to match the new location where
-print_banner is going to be.
+Say print_banner ends up at $4123 when we assemble at $4000, and $4223
+when assembling at $4100. Further, we determine MEMLO has $1D80. So,
+when we relocate the program, it ends up at $1E00 (the start of the
+next page). The target of the JSR instruction has to be adjusted
+to match the new location where print_banner is going to be. After
+relocation, the JSR $4123 reads JSR $1E23.
 
 The code that does the relocation, we'll call the relocator. The term
 "relocating loader" is used elsewhere, but it's not accurate here: DOS
 is the loader, and we're not replacing it.
 
 The relocator is a small routine that gets appended to the first
-executable (the $4000 one) as a segment, plus two data tables (one
-each for low and high bytes), as another 2 segments, plus an INITAD
-segment that runs the relocator code. These all have to load at a
-fixed address, but once they're finished running, they won't be needed
-again.
+executable (the $4000 one) as a segment, plus two data tables (one for
+the original ORG, code length, init, and run addresses, the other with
+the addresses that need adjusting), plus an INITAD segment that runs
+the relocator code. These all have to load at a fixed address, but
+once they're finished running, they won't be needed again.
 
 The relocator has to know the load address and the length of the main
 segment of the program (the part it's going to relocate). What it
 does:
 
-1. Subtract the contents of MEMLO from the load address ($4000 in the
-   example). This gives us a positive number (we hope!) that is the
-   amount each address in the program should have subtracted from it.
+1. Subtract the high byte of MEMLO from the high byte of the load address
+   ($4000 in the example), then add 1. This gives us a positive number
+   (we hope!) that is the amount each address's high byte in the
+   program should have subtracted from it.
 
-2. Iterate over the two data tables, subtracting the offset. Each table entry
-   is the two-byte address of a byte that needs to be changed (an
-   absolute address that's "baked" into the program). The high and low
-   bytes of the addresses in the code are handled separately (hence
-   the two tables). The low byte of the offset is subtracted from the
-   bytes at the addresses in the low-byte table, and the high byte of
-   the offset for the high-byte table.
+2. Iterate over the relocation data table, subtracting the
+   offset. Each table entry is the two-byte address of a byte that
+   needs to be changed (an absolute address that's "baked" into the
+   program).
 
-3. Move the main segment to MEMLO.
+3. Move the main segment to the start of the first page above MEMLO.
 
 4. Set MEMLO to point to the byte after the end of the program
    to protect it from being overwritten by e.g. BASIC or ASM/ED.
@@ -135,14 +126,10 @@ Notes:
   DOS and/or have lots of device drivers loaded, MEMLO could exceed
   $2000, which would cause your program to crash when loaded.
 
-- The data tables' combined size must not exceed 4K. Generally the
-  tables will be the same size, and each entry is 2 bytes, so this
-  means you can't have more than about 1000 absolute references in
-  your code. This doesn't count references that point outside your
-  code, like e.g. JMP CIOV or STA CRSINH; these won't be relocated,
-  or your program wouldn't work. As a reference, the 8K Atari BASIC
-  cartridge would require 1522 bytes of data tables, if we were trying
-  to relocate it.
+- Also, the start address has to start on a page boundary ($xx00).
+
+- The data table size must not exceed 4K. The table is compressed; see
+  "Relocation Table Format", below.
 
 - The original Wilkinson scheme was done entirely in Atari BASIC.
   I use a perl script to create the relocation tables and the
@@ -193,19 +180,13 @@ in the relocatable file as segments.
 
 Relocation tables start immediately after the last byte of the relocator.
 
-First 8 bytes are 4 words:
+First table is 8 bytes (4 words):
 - Original load address
 - Original end address
 - Original run address (or 0 for none)
 - Original init address (or 0 for none)
 
-The next N bytes are the high-byte relocation table. Each entry
-is a word, the address of a byte within the program that has to be
-relocated. The table ends with $0000.
-
-The next N bytes are the low-byte table, same format as the high-byte
-table including the $0000 at the end. The high and low byte tables
-will generally be the same size, but this is not a requirement.
+The next N bytes are the high-byte relocation table. See below.
 
 For the init address, if it's not zero, the relocator JSR's to it (at its
 new location).
@@ -268,3 +249,20 @@ version will be around 200 bytes: 28 bytes for the original file
 for the address table, and 20 bytes for the two relocation tables.
 However, the relocator and tables are only used once, and can be
 overwritten afterwards (so they count as free memory).
+
+Relocation Table Format
+
+Bitmap.
+
+The relocator is 256 bytes long or less.
+The GR.0 display list with a 16K cart in is at $7C20.
+We want to end the bitmap at $7C00.
+Bitmap table will always be 1/8 the code size.
+
+If your code is 18880 bytes, the bitmap size is 2360 bytes.
+Supposing you ORG at $2800:
+
+code - $2800 to $71BF
+relocator - $71C0 to $71CF
+8-byte table: $71D0 to $71D7
+bitmap - $71D8 to $7B10
-- 
cgit v1.2.3