relocate only high bytes, increments of 1 page. previous approach was unworkable.

author: B. Watson <urchlay@slackware.uk> 2025-04-23 04:15:47 -0400
committer: B. Watson <urchlay@slackware.uk> 2025-04-23 04:15:47 -0400
commit: 3a120e78480a3b43c5cf32d9e6efae5a698abd38 (patch)
tree: c1a1efc82f4e65abe7496e271faf621c5d6e3c48
parent: ed938284bfe486666917832888cd2894966edaab (diff)
download: atari8-self-relocator-3a120e78480a3b43c5cf32d9e6efae5a698abd38.tar.gz
6 files changed, 210 insertions, 141 deletions
diff --git a/Makefile b/Makefile
index 8212e19..412c0ab 100644
--- a/Makefile
+++ b/Makefile
@@ -12,7 +12,7 @@ hello40.xex: hello.s
 	cl65 -t none -o hello40.xex --asm-define start_addr=0x4000 hello.s
 
 hello41.xex: hello.s
-	cl65 -t none -o hello41.xex --asm-define start_addr=0x4102 hello.s
+	cl65 -t none -o hello41.xex --asm-define start_addr=0x4100 hello.s
 
 clean:
 	rm -f reloc.atr hello40.xex hello41.xex reloc.xex *.o
diff --git a/README.txt b/README.txt
index 6edc292..07ee72b 100644
--- a/README.txt
+++ b/README.txt
@@ -16,83 +16,74 @@ To build, just type "make". The result is "reloc.atr", which is
 an Atari disk image with DOS 2.0S and the relocatable program as
 AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run.
 
-The demo just shows "Hello World" with changing colors. The important
-part is that it got relocated to MEMLO and run from there. The code
-isn't relocatable (see the souce, "hello.s"). The relocator adjusted
-all the absolute addresses on the fly (at load time).
+The demo shows "Hello World" with changing colors, along with its own
+load address, end address, and the current MEMLO. The important part
+is that it got relocated to MEMLO and run from there. The code isn't
+relocatable (see the souce, "hello.s"). The relocator adjusted all the
+absolute addresses on the fly (at load time).
 
 How it works
 ------------
 
-In the original scheme, you'd assemble the code twice, with the origin
-(start address) one page apart. Say, assemble at address $4000, then
-the 2nd time at $4100. Now, any bytes in the two object files that
-differ by 1, are what needs to be changed when relocating. Suppose you
-want to relocate to $2000, you just subtract $20 from all the bytes in
-the first file that are 1 less than the same byte in the 2nd file.
-
-This works, and is simple enough. The limitation is, you can only
-relocate to an even page boundary. If you want to relocate to the
-bottom of memory (pointed to by MEMLO), you probably will waste a few
-bytes. In DOS 2.0S, I get $1CFC in MEMLO. Relocating to an even page
-boundary means the goes goes at $1D00, and the 4 bytes from $1CFC
-to $1D00 are wasted. That's not so bad... but if I enable another
-drive in DOS, that bumps MEMLO up by 128 bytes, to $1D7C. Then my
-relocatable code ends up at $1E00, and I waste 132 bytes below that...
-
-In the modified form presented here, the code is still assembled
-twice, but the 2nd pass is ORG'ed 258 ($0102) bytes higher than
-the first. Now we have bytes that differ by one (the high bytes of
-addresses) and others that differ by two (the low bytes).
-
-Another, more serious limitation of the code from Insight: Atari is
-that it doesn't produce self-relocating executables. What it produces
-is BASIC programs that have the relocatable object code as DATA
-statements, POKEd into memory when run. The relocator presented here
-gets appended to your standard executable and relocates it "on the
-fly", then jumps to the start of the relocated code.
+You assemble the code twice. The 2nd time around, you set the origin
+one page higher than the first. You have two executables that are
+identical except for the high bytes of absolute addresses within the
+code (which differ by one). Based on this information, the relocator
+can move the code to just above MEMLO and adjust all the addresses so
+it'll actually run in its new location.
+
+Unfortunately, the code can only be relocated by multiples of 256
+bytes. The low bytes aren't adjusted. So unless MEMLO happens to
+contain $FF in its low byte, some memory will be wasted (up to 256
+bytes).
+
+The code from Insight: Atari is doesn't produce self-relocating
+executables. What it produces is BASIC programs that have the
+relocatable object code as DATA statements, POKEd into memory when
+run. The relocator presented here gets appended to your standard
+executable and relocates it "on the fly", then jumps to the start of
+the relocated code.
 
 Example: a subroutine call to within our own code:
 
  JSR print_banner
 
 This is the first instruction in our program, so it will be found
-at $4000 for the first assembly pass, and $4102 for the second.
+at $4000 for the first assembly pass, and $4100 for the second.
 
-Say print_banner ends up at $4123 when we assemble at $4000, and $4225
-when assembling at $4102. Further, we determine MEMLO has $1D80. So,
-when we relocate the program, it ends up at $1D80. The target of the
-JSR instruction has to be adjusted to match the new location where
-print_banner is going to be.
+Say print_banner ends up at $4123 when we assemble at $4000, and $4223
+when assembling at $4100. Further, we determine MEMLO has $1D80. So,
+when we relocate the program, it ends up at $1E00 (the start of the
+next page). The target of the JSR instruction has to be adjusted
+to match the new location where print_banner is going to be. After
+relocation, the JSR $4123 reads JSR $1E23.
 
 The code that does the relocation, we'll call the relocator. The term
 "relocating loader" is used elsewhere, but it's not accurate here: DOS
 is the loader, and we're not replacing it.
 
 The relocator is a small routine that gets appended to the first
-executable (the $4000 one) as a segment, plus two data tables (one
-each for low and high bytes), as another 2 segments, plus an INITAD
-segment that runs the relocator code. These all have to load at a
-fixed address, but once they're finished running, they won't be needed
-again.
+executable (the $4000 one) as a segment, plus two data tables (one for
+the original ORG, code length, init, and run addresses, the other with
+the addresses that need adjusting), plus an INITAD segment that runs
+the relocator code. These all have to load at a fixed address, but
+once they're finished running, they won't be needed again.
 
 The relocator has to know the load address and the length of the main
 segment of the program (the part it's going to relocate). What it
 does:
 
-1. Subtract the contents of MEMLO from the load address ($4000 in the
-   example). This gives us a positive number (we hope!) that is the
-   amount each address in the program should have subtracted from it.
+1. Subtract the high byte of MEMLO from the high byte of the load address
+   ($4000 in the example), then add 1. This gives us a positive number
+   (we hope!) that is the amount each address's high byte in the
+   program should have subtracted from it.
 
-2. Iterate over the two data tables, subtracting the offset. Each table entry
-   is the two-byte address of a byte that needs to be changed (an
-   absolute address that's "baked" into the program). The high and low
-   bytes of the addresses in the code are handled separately (hence
-   the two tables). The low byte of the offset is subtracted from the
-   bytes at the addresses in the low-byte table, and the high byte of
-   the offset for the high-byte table.
+2. Iterate over the relocation data table, subtracting the
+   offset. Each table entry is the two-byte address of a byte that
+   needs to be changed (an absolute address that's "baked" into the
+   program).
 
-3. Move the main segment to MEMLO.
+3. Move the main segment to the start of the first page above MEMLO.
 
 4. Set MEMLO to point to the byte after the end of the program
    to protect it from being overwritten by e.g. BASIC or ASM/ED.
@@ -135,14 +126,10 @@ Notes:
   DOS and/or have lots of device drivers loaded, MEMLO could exceed
   $2000, which would cause your program to crash when loaded.
 
-- The data tables' combined size must not exceed 4K. Generally the
-  tables will be the same size, and each entry is 2 bytes, so this
-  means you can't have more than about 1000 absolute references in
-  your code. This doesn't count references that point outside your
-  code, like e.g. JMP CIOV or STA CRSINH; these won't be relocated,
-  or your program wouldn't work. As a reference, the 8K Atari BASIC
-  cartridge would require 1522 bytes of data tables, if we were trying
-  to relocate it.
+- Also, the start address has to start on a page boundary ($xx00).
+
+- The data table size must not exceed 4K. The table is compressed; see
+  "Relocation Table Format", below.
 
 - The original Wilkinson scheme was done entirely in Atari BASIC.
   I use a perl script to create the relocation tables and the
@@ -193,19 +180,13 @@ in the relocatable file as segments.
 
 Relocation tables start immediately after the last byte of the relocator.
 
-First 8 bytes are 4 words:
+First table is 8 bytes (4 words):
 - Original load address
 - Original end address
 - Original run address (or 0 for none)
 - Original init address (or 0 for none)
 
-The next N bytes are the high-byte relocation table. Each entry
-is a word, the address of a byte within the program that has to be
-relocated. The table ends with $0000.
-
-The next N bytes are the low-byte table, same format as the high-byte
-table including the $0000 at the end. The high and low byte tables
-will generally be the same size, but this is not a requirement.
+The next N bytes are the high-byte relocation table. See below.
 
 For the init address, if it's not zero, the relocator JSR's to it (at its
 new location).
@@ -268,3 +249,20 @@ version will be around 200 bytes: 28 bytes for the original file
 for the address table, and 20 bytes for the two relocation tables.
 However, the relocator and tables are only used once, and can be
 overwritten afterwards (so they count as free memory).
+
+Relocation Table Format
+
+Bitmap.
+
+The relocator is 256 bytes long or less.
+The GR.0 display list with a 16K cart in is at $7C20.
+We want to end the bitmap at $7C00.
+Bitmap table will always be 1/8 the code size.
+
+If your code is 18880 bytes, the bitmap size is 2360 bytes.
+Supposing you ORG at $2800:
+
+code - $2800 to $71BF
+relocator - $71C0 to $71CF
+8-byte table: $71D0 to $71D7
+bitmap - $71D8 to $7B10
diff --git a/autorun.sys b/autorun.sys
index 31e2914..41fc439 100644
--- a/autorun.sys
+++ b/autorun.sys
diff --git a/hello.s b/hello.s
index fa0ede0..8642218 100644
--- a/hello.s
+++ b/hello.s
@@ -17,10 +17,23 @@
  .endif
 
  .org start_addr
+print_addr:
+ jsr pa
+pa:         ; pull return address, print it (don't put it back on the stack)
+ pla
+ sec
+ sbc #2     ; pa-1 was pushed, we want to print print_addr
+ sta sptr   ; lo byte
+ pla
+ sbc #0
+ sta sptr+1
+ jsr printhex
+ lda sptr
+ jmp printhex
+ rts
+
 _main:
- ldx #0
- stx COLCRS
- inx
+ ldx #1
  stx CRSINH
  lda #<str1
  ldx #>str1
@@ -32,6 +45,33 @@ _main:
  jsr printstr
  lda #'.'
  jsr printa
+ lda #EOL
+ jsr printa
+ lda #<addr_str
+ ldx #>addr_str
+ jsr printstr
+ jsr print_addr
+ lda #EOL
+ jsr printa
+ lda #<end_str
+ ldx #>end_str
+ jsr printstr
+ lda #>(end_addr-1)
+ jsr printhex
+ lda #<(end_addr-1)
+ jsr printhex
+ lda #<memlo_str
+ ldx #>memlo_str
+ jsr printstr
+ lda MEMLO+1
+ jsr printhex
+ lda MEMLO
+ jsr printhex
+ lda #<relax_str
+ ldx #>relax_str
+ jsr printstr
+ jsr junksub
+
 cycle:
  lda RTCLOK+2
  and #$f0
@@ -55,6 +95,30 @@ printchr:
  bne strloop
  rts
 
+ ; Subroutine: Print A register in hex.
+printhex:
+ pha           ; stash argument
+ lsr           ; shift right 4 times,
+ lsr           ;   to get the first hex digit
+ lsr           ;   (aka nybble) into the bottom
+ lsr           ;    4 bit positions.
+ jsr printxdig ; print the top nybble.
+ pla           ; restore original value...
+ and #$0f      ; mask off high nybble
+ ; fall through to print the 2nd digit.
+
+ ; Subroutine: Print a nybble (A=0 to $0f) in hex.
+printxdig:
+ ora #$30 ; 0-9 now ASCII...
+ cmp #$3a ; do we have A-F?
+ bcc xok  ; if not, don't adjust it
+ adc #$26 ; A-F now ASCII: $3a + $26 + 1 (carry always set) = $61 (a)
+xok:
+ ; fall through to print the digit.
+
+ ; Subroutine: Print ATASCII character in A.
+ ; Assumes IOCB #0 is opened on the E: device, which is how the
+ ; Atari boots up. Uses "call-by-RTS" (weird looking but standard).
 printa:
  tax
  lda ICPTH ; the print-one-byte vector for IOCB 0.
@@ -64,8 +128,21 @@ printa:
  txa
  rts
 
-str1: .byte "Hello",0
-str2: .byte "World",0
+junksub:
+ .repeat 512
+ nop
+ .endrep
+ lda #'O'
+ jsr printa
+ lda #'K'
+ jmp printa
+
+str1:      .byte "Hello",0
+str2:      .byte "World",0
+addr_str:  .byte EOL, "I am currently located at $",0
+end_str:   .byte      "My code ends at           $",0
+memlo_str: .byte EOL, "MEMLO is currently set to $",0
+relax_str: .byte EOL, EOL, "Watchen das blinkenlights...",EOL,0
 
  .ifndef RAW
 end_addr:
diff --git a/mkrelocxex.pl b/mkrelocxex.pl
index e4b8a10..a871cd5 100755
--- a/mkrelocxex.pl
+++ b/mkrelocxex.pl
@@ -62,7 +62,15 @@ sub print_table {
 printf("lo start/end: \$%04x/\$%04x\n", $start, $end);
 printf("hi start/end: \$%04x/\$%04x\n", $hi_start, $hi_end);
 
-if(($hi_start != ($start + 0x0102)) || ($hi_end != ($end + 0x0102))) {
+if(($start % 0x100) || ($hi_start % 0x100)) {
+	die "starting address not on a page boundary\n";
+}
+
+if($start != ($hi_start - 0x100)) {
+	die "starting addresses not one page apart\n";
+}
+
+if(($hi_start != ($start + 0x0100)) || ($hi_end != ($end + 0x0100))) {
 	die "mismatched segment lengths\n";
 }
 
@@ -74,21 +82,19 @@ for($i = 0; $i < @bytes; $i++) {
 	next if $a == $b;
 	if($b == ($a + 1)) {
 		push @hi_table, ($i + $start);
-	} elsif($b == ($a + 2)) {
-		push @lo_table, ($i + $start);
 	} else {
-		die "invalid difference (not 1 or 2)\n";
+		die "invalid difference (not 0 or 1)\n";
 	}
 }
 
 push(@hi_table, 0);
-push(@lo_table, 0);
 
 print_table("hi", \@hi_table);
-print_table("lo", \@lo_table);
+print "table size: " . @hi_table . " bytes\n\n";
 
 ($istart, $iend) = read_header($lo);
-warn "istart $istart  iend $iend\n";
+
+#warn "istart $istart  iend $iend\n";
 if($istart == 0x2e2 && $iend == 0x2e3) {
 	$init = read_word($lo);
 }
@@ -96,7 +102,7 @@ if($istart == 0x2e2 && $iend == 0x2e3) {
 # OK, make the output file now...
 print $out chr(0xff);
 print $out chr(0xff);
-warn $start;
+#warn $start;
 print $out chr($start & 0xff);
 print $out chr($start >> 8);
 print $out chr($end & 0xff);
@@ -126,14 +132,9 @@ for(@hi_table) {
 	$rcode .= chr($_ >> 8);
 }
 
-for(@lo_table) {
-	$rcode .= chr($_ & 0xff);
-	$rcode .= chr($_ >> 8);
-}
-
 $rend = $rstart + length($rcode) - 1;
 
-warn "$rstart $rend " . length($rcode);
+#warn "$rstart $rend " . length($rcode);
 
 # don't really need a ffff header, makes it easier to read hexdumps.
 print $out chr(0xff);
diff --git a/reloc.s b/reloc.s
index c4c3c58..3702430 100644
--- a/reloc.s
+++ b/reloc.s
@@ -11,13 +11,11 @@
  code_init  = end_addr+6
  table      = end_addr+8
 
- zp_addr    = FR0
- offset_lo  = zp_addr
- offset_hi  = zp_addr+1
- table_ptr  = zp_addr+2 ; 2 bytes
- dest_ptr   = table_ptr
- code_ptr   = zp_addr+4 ; 2 bytes
- fixup      = zp_addr+6
+ zp_addr      = FR0
+ offset_pages = zp_addr   ; 1 byte
+ table_ptr    = zp_addr+1 ; 2 bytes
+ dest_ptr     = table_ptr ; alias
+ code_ptr     = zp_addr+3 ; 2 bytes
 
  .org start_addr - 6
  .word $ffff
@@ -25,14 +23,19 @@
  .word end_addr - 1
 
 _main:
- lda code_start
- sec
- sbc MEMLO
+ ; if MEMLO isn't on a page boundary, move it up to the next
+ ; page, e.g. $1cfc => $1d00.
+ lda MEMLO
+ beq memlo_00
+ inc MEMLO+1
+ lda #0
+ sta MEMLO
+memlo_00:
 
- sta offset_lo
  lda code_start+1
+ sec
  sbc MEMLO+1
- sta offset_hi
+ sta offset_pages
 
  bcs memlo_ok
 
@@ -60,19 +63,37 @@ exitwait:
  rts
 
 memlo_ok:
- ; 1st fixup pass, hi bytes: table comes right after our code
- sta fixup
+ ; adjust addresses before moving the code
  lda #<table
  sta table_ptr
  lda #>table
  sta table_ptr+1
- jsr fixup_addrs
- 
- ; 2nd fixup pass, lo bytes: table_ptr already points to table
- lda offset_lo
- sta fixup
- jsr fixup_addrs
 
+fixup_addrs:
+ ldy #1
+ lda (table_ptr),y
+ sta code_ptr+1
+ dey
+ lda (table_ptr),y
+ sta code_ptr
+ inc table_ptr  ; point to next entry
+ bne tp1ok
+ inc table_ptr+1
+tp1ok:
+ inc table_ptr
+ bne tp2ok
+ inc table_ptr+1
+tp2ok:
+ ora code_ptr+1 ; quit if we hit $0000 in the table
+ beq fixup_done
+ lda (code_ptr),y ; Y still 0
+ sec
+ sbc offset_pages
+ sta (code_ptr),y
+ sec ; *should* already be set...
+ bcs fixup_addrs
+fixup_done:
+ 
  ; absolute addresses are fixed up, now move the code.
  lda code_start
  sta code_ptr
@@ -134,23 +155,18 @@ ceok:
  bcc do_init
 
  ; fix up RUNAD
- lda RUNAD
- sec
- sbc offset_lo
  lda RUNAD+1
- sbc offset_hi
+ sec
+ sbc offset_pages
+ sta RUNAD+1
 
 do_init:
  ; if there's an init address, call it (just like DOS would).
  lda code_init+1
  beq done          ; if hi byte is 0, assume lo byte is also 0.
 
- lda code_init     ; subtract offset
  sec
- sbc offset_lo
- sta code_init
- lda code_init+1
- sbc offset_lo
+ sbc offset_pages
  sta code_init+1
 
  jmp (code_init)
@@ -159,29 +175,6 @@ do_init:
 done:
  rts
 
-fixup_addrs:
- ldy #1
- lda (table_ptr),y
- sta code_ptr+1
- dey
- lda (table_ptr),y
- sta code_ptr
- inc table_ptr  ; point to next entry
- bne tp1ok
- inc table_ptr+1
-tp1ok:
- inc table_ptr
- bne tp2ok
- inc table_ptr+1
-tp2ok:
- ora code_ptr+1 ; quit if we hit $0000 in the table
- beq done
- lda (code_ptr),y ; Y still 0
- sec
- sbc fixup
- sta (code_ptr),y
- jmp fixup_addrs
-
 whoops_msg: .byte "MEMLO is too high! Press any key to exit.", EOL
  whoops_len = (*-whoops_msg)
author	B. Watson <urchlay@slackware.uk>	2025-04-23 04:15:47 -0400
committer	B. Watson <urchlay@slackware.uk>	2025-04-23 04:15:47 -0400
commit	3a120e78480a3b43c5cf32d9e6efae5a698abd38 (patch)
tree	c1a1efc82f4e65abe7496e271faf621c5d6e3c48
parent	ed938284bfe486666917832888cd2894966edaab (diff)
download	atari8-self-relocator-3a120e78480a3b43c5cf32d9e6efae5a698abd38.tar.gz