How to do a self-relocating Atari 8-bit executable... This is a modified form of a technique I saw in Bill Wilkinson's Insight: Atari column in Compute! magazine (Issue 21, Feb 1982). In the original scheme, you'd assemble the code twice, with the origin (start address) one page apart. Say, assemble at address $4000, then the 2nd time at $4100. Now, any bytes in the two object files that differ by 1, are what needs to be changed when relocating. Suppose you want to relocate to $2000, you just subtract $20 from all the bytes in the first file that are 1 less than the same byte in the 2nd file. This works, and is simple enough. The limitation is, you can only relocate to an even page boundary. If you want to relocate to the bottom of memory (pointed to by MEMLO), you probably will waste a few bytes. In DOS 2.0S, I get $1CFC in MEMLO. Relocating to an even page boundary means the goes goes at $1D00, and the 4 bytes from $1CFC to $1D00 are wasted. That's not so bad... but if I enable another drive in DOS, that bumps MEMLO up by 128 bytes, to $1D7C. Then my relocatable code ends up at $1E00, and I waste 132 bytes below that... In the modified form presented here, the code is still assembled twice, but the 2nd pass is ORG'ed 258 ($0102) bytes higher than the first. Now we have bytes that differ by one (the high bytes of addresses) and others that differ by two (the low bytes). Another, more serious limitation of the code from Insight: Atari is that it doesn't produce self-relocating executables. What it produces is BASIC programs that have the relocatable object code as DATA statements, POKEd into memory when run. The relocator presented here gets appended to your standard executable and relocates it "on the fly", then jumps to the start of the relocated code. Example: a subroutine call to within our own code: JSR print_banner This is the first instruction in our program, so it will be found at $4000 for the first assembly pass, and $4102 for the second. Say print_banner ends up at $4123 when we assemble at $4000, and $4225 when assembling at $4102. Further, we determine MEMLO has $1D80. So, when we relocate the program, it ends up at $1D80. The target of the JSR instruction has to be adjusted to match the new location where print_banner is going to be. The code that does the relocation, we'll call the relocator. The term "relocating loader" is used elsewhere, but it's not accurate here: DOS is the loader, and we're not replacing it. The relocator is a small routine that gets appended to the first executable (the $4000 one) as a segment, plus two data tables (one each for low and high bytes), as another 2 segments, plus an INITAD segment that runs the relocator code. These all have to load at a fixed address, but once they're finished running, they won't be needed again. The relocator has to know the load address and the length of the main segment of the program (the part it's going to relocate). What it does: 1. Subtract the load address ($4000 in the example) from the contents of MEMLO. This gives us a negative number (we hope!) that is the amount each address in the program should have added to it. 2. Iterate over the two data tables, adding the offset. Each table entry is the two-byte address of a byte that needs to be changed (an absolute address that's "baked" into the program). The high and low bytes of the addresses in the code are handled separately (hence the two tables). The low byte of the offset is added to the bytes at the addresses in the low-byte table, and the high byte of the offset for the high-byte table. 3. Moves the main segment to MEMLO. 4. Set MEMLO to point to the byte after the end of the program to protect it from being overwritten by e.g. BASIC or ASM/ED. 5. Add the offset to the contents of RUNAD, which is the run address of the program, and then do an RTS to hand control back to DOS. DOS will run the relocated code by jumping to the altered RUNAD. Notes: - To keep things simple, the program must consist of a single segment of code and data, followed by an init address and/or an run address. - If your program is a device driver or a "TSR", you should use an init address, NOT a run address. This allows users to append your program to e.g. an RS-232 driver, and maybe a RAMdisk driver too, etc. Each driver should have an init address, because Atari executables can have multiple init addresses. - If your program is an application, it's usually better to use a run address. If you use an init address, your program will run, but DOS will still be "in the middle of" loading the executable, meaning IOCB #1 will still be open for reading. - The program's end address must be below $6C00, since that's where the relocator and tables load. The reason for this restriction is to allow the relocatable executable to work with a 16K cartridge. The lowest sane start address for the program is probably $2000, which allows the program to be 19KB in size... though $3000 is a lot safer (15KB max). - Whatever start address (ORG) you use for the program, it has to be higher than the current MEMLO when the relocation is done. That's why I said $3000 is safer than $2000: if someone uses a fancy DOS and/or have lots of device drivers loaded, MEMLO could exceed $2000, which would cause your program to crash when loaded. - The data tables' combined size must not exceed 4K. Generally the tables will be the same size, and each entry is 2 bytes, so this means you can't have more than about 1000 absolute references in your code. This doesn't count references that point outside your code, like e.g. JMP CIOV or STA CRSINH; these won't be relocated, or your program wouldn't work. As a reference, the 8K Atari BASIC cartridge would require 1522 bytes of data tables, if we were trying to relocate it. - The original Wilkinson scheme was done entirely in Atari BASIC. I use a C program to create the relocation tables, and the relocator itself becomes part of the relocatable program, so BASIC is not required. The C program can be run on either the Atari or on a modern POSIX system, which is especially useful if you use a cross-assembler to write and assemble your Atari code. - Indirect JMP instructions should always be used with care on the 6502. The two operand bytes have to be in the same page, due to a 6502 bug. Most 6502 asm programmers know how to handle this... but with dynamically relocatable code, there's not really a good way to do it. Best to avoid indirect JMPs. One simple workaround is to use self-modifying code: Have an absolute JMP instruction in your code, and store the indirect jump's destination there. Example: JMP (VECTOR) ...becomes: LDA VECTOR STA TRAMPOLINE+1 LDA VECTOR+1 STA TRAMPOLINE+2 JMP TRAMPOLINE ; somewhere in the code you have this: TRAMPOLINE JMP $0000 Another way to do it would be to use call-by-RTS (push the jump address minus one on the stack, then execute RTS). - If your code has really tight cycle-counted timing loops, the timing might get thrown off due to relocation causing a branch to cross a page boundary, when it was originally not supposed to. This kind of code generally only belongs in games and demos. Relocatable code is usually used for things like device drivers or programming utilities. Games "take over" the whole machine and don't have to care about MEMLO or other software needing free RAM. Format of the relocatable executable: - Segment with the original code, at the original load address. - Segment with the relocator code and relocation tables. - INITAD segment that runs the relocator code. Note that the original RUNAD and INITAD segments (if any) don't appear in the relocatable file as segments. Relocation tables start immediately after the last byte of the relocator. First 8 bytes are 4 words: - Original load address - Original end address - Original run address (or 0 for none) - Original init address (or 0 for none) The next N bytes are the high-byte relocation table. Each entry is a word, the address of a byte within the program that has to be relocated. The table ends with $0000. The next N bytes are the low-byte table, same format as the high-byte table including the $0000 at the end. The high and low byte tables will generally be the same size, but this is not a requirement. For the init address, if it's not zero, the relocator JSR's to it (at its new location). For the run address, if it's not zero, the relocator adjusts RUNAD, and DOS uses RUNAD as usual when the program's done loading. Example: *=$4000 start: jsr set_color ; $4000 JSR $4007 jsr set_cursor ; $4003 JSR $400E rts ; $4006 set_color: lda bgcolor ; $4007 LDA $4015 sta COLOR2 ; $400A rts ; $400D set_cursor: lda cursor ; $400E LDA $4016 sta CRSINH ; $4011 rts ; $4014 bgcolor: .byte $00 ; $4015 cursor: .byte $01 ; $4016 *=INITAD .word start The address table for the above program: $00 40 - code_start $16 40 - code_end $00 00 - code_run (no run address) $00 40 - code_init High byte relocation table: $02 $40 ; hi byte of JSR $4007 operand $05 $40 ; hi byte of JSR $400E operand $09 $40 ; hi byte of LDA $4015 operand $10 $40 ; hi byte of LDA $4016 $00 $00 ; terminator Low byte relocation table: $01 $40 ; lo byte of JSR $4007 operand $04 $40 ; lo byte of JSR $400E operand $08 $40 ; lo byte of LDA $4015 operand $0F $40 ; lo byte of LDA $4016 $00 $00 ; terminator Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator will move the program to $1CFC - $1D12 and set MEMLO to $1D13. The operand of the first instruction (was JSR $4007) will be altered to $1D03 (aka $4007 - $4000 + $1CFC), which is the address that the subroutine got relocated to. The original program assembled to a 32-byte file. The relocatable version will be around 200 bytes: 28 bytes for the original file (minus its INITAD segment), ~128 bytes for the relocator code, 8 bytes for the address table, and 20 bytes for the two relocation tables. However, the relocator and tables are only used once, and can be overwritten afterwards (so they count as free memory).