Atari 8-Bit Self Relocator -------------------------- This is a modified form of a technique I saw in Bill Wilkinson's Insight: Atari column in Compute! magazine (Issue 21, Feb 1982). To build the relocator and run the demo, you'll need: - cc65 from https://cc65.github.io/ - axe from https://slackware.uk/~urchlay/repos/bw-atari8-tools ...as well as standard Linux packages like make and perl. To build, just type "make". The result is "reloc.atr", which is an Atari disk image with DOS 2.0S and the relocatable program as AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run. The demo shows "Hello World" with changing colors, along with its own load address, end address, and the current MEMLO. The important part is that it got relocated to MEMLO and run from there. The code isn't relocatable (see the souce, "hello.s"). The relocator adjusted all the absolute addresses on the fly (at load time). How it works ------------ You assemble the code twice. The 2nd time around, you set the origin one page higher than the first. You have two executables that are identical except for the high bytes of absolute addresses within the code (which differ by one). Based on this information, the relocator can move the code to just above MEMLO and adjust all the addresses so it'll actually run in its new location. Unfortunately, the code can only be relocated by multiples of 256 bytes. The low bytes aren't adjusted. So unless MEMLO happens to contain $FF in its low byte, some memory will be wasted (up to 255 bytes). The code from Insight: Atari is doesn't produce self-relocating executables. What it produces is BASIC programs that have the relocatable object code as DATA statements, POKEd into memory when run. The relocator presented here gets appended to your standard executable and relocates it "on the fly", then jumps to the start of the relocated code. Example: a subroutine call to within our own code: JSR print_banner This is the first instruction in our program, so it will be found at $4000 for the first assembly pass, and $4100 for the second. Say print_banner ends up at $4123 when we assemble at $4000, and $4223 when assembling at $4100. Further, we determine MEMLO has $1D80. So, when we relocate the program, it ends up at $1E00 (the start of the next page). The target of the JSR instruction has to be adjusted to match the new location where print_banner is going to be. After relocation, the JSR $4123 reads JSR $1E23. The code that does the relocation, we'll call the relocator. The term "relocating loader" is used elsewhere, but it's not accurate here: DOS is the loader, and we're not replacing it. The relocator is a small routine that gets appended to the first executable (the $4000 one) as a segment, plus two data tables (one for the original ORG, code length, init, and run addresses, the other with the addresses that need adjusting), plus an INITAD segment that runs the relocator code. These all have to load at a fixed address, but once they're finished running, they won't be needed again. The relocator has to know the load address and the length of the main segment of the program (the part it's going to relocate). What it does: 1. Subtract the high byte of MEMLO from the high byte of the load address ($4000 in the example), then add 1. This gives us a positive number (we hope!) that is the amount each address's high byte in the program should have subtracted from it. 2. Iterate over the relocation data table, subtracting the offset. Each table entry is the two-byte address of a byte that needs to be changed (an absolute address that's "baked" into the program). 3. Move the main segment to the start of the first page above MEMLO. 4. Set MEMLO to point to the byte after the end of the program to protect it from being overwritten by e.g. BASIC or ASM/ED. 5. If the program has an init address, subtract the offset from it, then jump to it. This runs the payload program's init routine. 5. If the program has a run address, subtract the offset from it, storing the result in RUNAD. Then do an RTS to hand control back to DOS. DOS will run the relocated code by jumping to the altered RUNAD, in the usual way. Notes: - To keep things simple, the program must consist of a single segment of code and data, followed by an init address and/or an run address. - If your program is a device driver or a "TSR", you should use an init address, NOT a run address. This allows users to append your program to e.g. an RS-232 driver, and maybe a RAMdisk driver too, etc. Each driver should have an init address, because Atari executables can have multiple init addresses. - If your program is an application, it's usually better to use a run address. If you use an init address, your program will run, but DOS will still be "in the middle of" loading the executable, meaning IOCB #1 will still be open for reading. - The program's end address must be below $6C00, since that's where the relocator and tables load. The reason for this restriction is to allow the relocatable executable to work with a 16K cartridge. The lowest sane start address for the program is probably $2000, which allows the program to be 19KB in size... though $3000 is a lot safer (15KB max). - Whatever start address (ORG) you use for the program, it has to be higher than the current MEMLO when the relocation is done. That's why I said $3000 is safer than $2000: if someone uses a fancy DOS and/or have lots of device drivers loaded, MEMLO could exceed $2000, which would cause your program to crash when loaded. - Also, the start address has to start on a page boundary ($xx00). - The data table size must not exceed 4K. The table is compressed; see "Relocation Table Format", below. - The original Wilkinson scheme was done entirely in Atari BASIC. I use a perl script to create the relocation tables and the relocator itself becomes part of the relocatable program, so BASIC is not required. The perl script will be rewritten in C at some point, and the the C program will run on either the Atari or on a modern POSIX system. - Indirect JMP instructions should always be used with care on the 6502. The two operand bytes have to be in the same page, due to a 6502 bug. Most 6502 asm programmers know how to handle this... but with dynamically relocatable code, there's not really a good way to do it. Best to avoid indirect JMPs. One simple workaround is to use self-modifying code: Have an absolute JMP instruction in your code, and store the indirect jump's destination there. Example: JMP (VECTOR) ...becomes: LDA VECTOR STA TRAMPOLINE+1 LDA VECTOR+1 STA TRAMPOLINE+2 JMP TRAMPOLINE ; somewhere in the code you have this: TRAMPOLINE JMP $0000 Another way to do it would be to use call-by-RTS (push the jump address minus one on the stack, then execute RTS). - If your code has really tight cycle-counted timing loops, the timing might get thrown off due to relocation causing a branch to cross a page boundary, when it was originally not supposed to. This kind of code generally only belongs in games and demos. Relocatable code is usually used for things like device drivers or programming utilities. Games "take over" the whole machine and don't have to care about MEMLO or other software needing free RAM. Format of the relocatable executable: - Segment with the original code, at the original load address. - Segment with the relocator code and relocation tables. - INITAD segment that runs the relocator code. Note that the original RUNAD and INITAD segments (if any) don't appear in the relocatable file as segments. Relocation tables start immediately after the last byte of the relocator. First table is 8 bytes (4 words): - Original load address - Original end address - Original run address (or 0 for none) - Original init address (or 0 for none) The next N bytes are the high-byte relocation table. See below. For the init address, if it's not zero, the relocator JSR's to it (at its new location). For the run address, if it's not zero, the relocator adjusts RUNAD, and DOS uses RUNAD as usual when the program's done loading. Example: *=$4000 start: jsr set_color ; $4000 JSR $4007 jsr set_cursor ; $4003 JSR $400E rts ; $4006 set_color: lda bgcolor ; $4007 LDA $4015 sta COLOR2 ; $400A rts ; $400D set_cursor: lda cursor ; $400E LDA $4016 sta CRSINH ; $4011 rts ; $4014 bgcolor: .byte $00 ; $4015 cursor: .byte $01 ; $4016 *=INITAD .word start The address table for the above program: $00 40 - code_start $16 40 - code_end $00 00 - code_run (no run address) $00 40 - code_init High byte relocation table: $02 $40 ; hi byte of JSR $4007 operand $05 $40 ; hi byte of JSR $400E operand $09 $40 ; hi byte of LDA $4015 operand $10 $40 ; hi byte of LDA $4016 $00 $00 ; terminator Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator will move the program to $1D00 - $1D16 and set MEMLO to $1D17. The operand of the first instruction (was JSR $4007) will be altered to $1D07 (aka $4007 - $4000 + $1CFC), which is the address that the subroutine got relocated to. The original program assembled to a 32-byte file. The relocatable version will be around 400 bytes: 28 bytes for the original file (minus its INITAD segment), ~300 bytes for the relocator code, 8 bytes for the address table, and 10 bytes for the relocation table. However, the relocator and tables are only used once, and can be overwritten afterwards (so they count as free memory). Relocation Table Format Current implementation: A list of addresses that need to be adjusted (high bytes of absolute addresses), 2 bytes each, terminated with $00 $00. Possible future implementation: Bitmap. One bit per byte in the file. 1 if the address needs adjusting, 0 if not. This *probably* will actually be smaller than the list of addresses. Also has the advantage of being a fixed size, easily calculated/predicted. The relocator is 256 bytes long or less. The GR.0 display list with a 16K cart in is at $7C20. We want to end the bitmap at $7C00. Bitmap table will always be 1/8 the code size. If your code is 18880 bytes, the bitmap size is 2360 bytes. Supposing you ORG at $2800: code - $2800 to $71BF relocator - $71C0 to $71CF 8-byte table: $71D0 to $71D7 bitmap - $71D8 to $7B10