Atari 8-Bit Self Relocator -------------------------- This is a modified form of a technique I saw in Bill Wilkinson's Insight: Atari column in Compute! magazine (Issue 21, Feb 1982). It creates Atari executables that relocate themselves to just above MEMLO. To build the relocator and run the demo, you'll need: - cc65 from https://cc65.github.io/ - axe from https://slackware.uk/~urchlay/repos/bw-atari8-tools ...as well as standard Linux packages like make and gcc. To build the demo, just type "make". The result is "reloc.atr", which is an Atari disk image with DOS 2.0S and the relocatable program as AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run. There's also "reloc25.atr", which is the same thing except it's DOS 2.5 (with MEMLO a bit higher). The demo shows "Hello World" with changing colors, along with its own load address, end address, and the current MEMLO. The important part is that it got relocated to MEMLO and run from there. The code isn't relocatable (see the source, "hello.s"). The relocator adjusted all the absolute addresses on the fly, at load time. There's also a "native.atr", which is a DOS 2.0S bootable disk with the relocator compiled for the Atari, as MKRELOC.XEX. This will load with DOS's L command, and will read LO.XEX and HI.XEX (which are non-relocatable) and create a relocatable AUTORUN.SYS. Reboot to see the demo run. Usage ----- To create relocating executables of your own software, you can use either a modern system running a 6502 cross-assembler (atasm, xa65, ca65, dasm, etc) or an Atari 8-bit. First, write your code. There are some limitations: - All your code and data must be in a single segment. Generally this means, only set the origin once, and don't use *= or .org again until the end (for RUNAD and/or INITAD). - Your code's origin (start address) must begin on an even page boundary, $2800 or higher. - You can use only one init address. Once your code is written and tested: - Assemble the code at a start address of $2800 or higher, as a regular Atari executable (.xex/.com/.bin file). The executable must be called "lo.xex" if you're cross-assembling, or "D:LO.XEX" if you're using the Atari. - Change the start address (*= or .org direcive) so that it's one page higher. If you used $2800, you'd change it to $2900. - Assemble the code again, to an executable called "hi.xex", or "D:HI.XEX" on the Atari. - Make sure you have the reloc.xex (D:RELOC.XEX) and mkreloc (D:MKRELOC.XEX) files in the same directory (or on the same disk). - Run the relocator. On a modern system the command will be "./mkreloc" (or possibly just "mkreloc" if you installed it somewhere on your $PATH). On the Atari, load D:MKRELOC.XEX from the DOS menu. - If you're using an Atari, wait a bit. Listen to the disk I/O beeps... when it's finished, you'll be back at the DOS menu. You will have a brand-new AUTORUN.SYS, which is the self relocating version of your program. You can reboot to run it. - If you're on a modern system, you'll have a (lowercase) autorun.sys, which you can copy to a DOS disk image. You can also test-run it by directly loading it with an emulator (e.g. "atari800 autorun.sys"), if it can run without DOS. How it works ------------ You assemble the code twice. The 2nd time around, you set the origin one page higher than the first. You have two executables that are identical except for the high bytes of absolute addresses within the code (which differ by one). Based on this information, the relocator can move the code to just above MEMLO and adjust all the addresses so it'll actually run in its new location. Unfortunately, the code can only be relocated by multiples of 256 bytes. The low bytes aren't adjusted. So unless MEMLO happens to contain $FF in its low byte, some memory will be wasted (up to 255 bytes). The code from Insight: Atari is doesn't produce self-relocating executables. What it produces is BASIC programs that have the relocatable object code as DATA statements, POKEd into memory when run. The relocator presented here gets appended to your standard executable and relocates it "on the fly", then jumps to the (relocated) run and/or init address of the relocated code. Example: a subroutine call to within our own code: JSR print_banner This is the first instruction in our program. Say we assemble at $4000, so it will be found at $4000 for the first assembly pass, and $4100 for the second. Say print_banner ends up at $4123 when we assemble at $4000, and $4223 when assembling at $4100. Further, we determine MEMLO has $1D80. So, when we relocate the program, it ends up at $1E00 (the start of the next page). The target of the JSR instruction has to be adjusted to match the new location where print_banner is going to be. After relocation, the JSR $4123 reads JSR $1E23. The code that does the relocation, we'll call the relocator. The term "relocating loader" is used elsewhere, but it's not accurate here: DOS is the loader, and we're not replacing it. The relocator is a small routine that gets appended to the first executable (the $4000 one) as a segment, plus two data tables. The first is 8 bytes, and has the original ORG, code length, init, and run address. The other is a bitmap of the addresses in the program, one bit per byte in the program. The bit is set to 1 if that address needs relocating, or 0 if not. The tables are followed by an INITAD segment that runs the relocator code. The relocator and the tables have to load at a fixed address, but once they're finished running, they won't be needed again. The relocator has to know the load address and the length of the "payload" segment of the program (the part it's going to relocate). At load time, it gets run via INITAD. What it does: 1. Subtract the high byte of MEMLO from the high byte of the load address ($4000 in the example), then add 1. This gives us a positive number (we hope!) that is the amount each address's high byte in the program should have subtracted from it. 3. Loop over the code to be relocated, copying it to the new address (start of the first page above MEMLO). As each byte is moved, it's also adjusted (has the offset subtracted from it) if its bit in the relocation table is set. 4. Set MEMLO to point to the byte after the end of the program to protect it from being overwritten by e.g. BASIC or ASM/ED. 5. If the program has an init address, subtract the offset from it, then jump to it. This runs the payload program's init routine. 6. If the program has a run address, subtract the offset from it, storing the result in RUNAD. Then do an RTS to hand control back to DOS. DOS will run the relocated code by jumping to the altered RUNAD, in the usual way. Notes: - If your program is a device driver or a "TSR", you should use an init address, NOT a run address. This allows users to append your program to e.g. an RS-232 driver, and maybe a RAMdisk driver too, etc. Each driver should have an init address, because Atari executables can have multiple init addresses. - If your program is an application, it's usually better to use a run address. If you use an init address, your program will run, but DOS will still be "in the middle of" loading the executable, meaning IOCB #1 will still be open for reading. - The program's end address must be below $71C0, since that's where the relocator and tables load. The reason for this restriction is to allow the relocatable executable to work with a 16K cartridge. The lowest sane start address for the program is probably $2800, which allows the program to be 18.5KB in size... though $3000 is a lot safer (16.5KB max). - Whatever start address (ORG) you use for the program, it has to be higher than the current MEMLO when the relocation is done. That's why I said $3000 is safer than $2800: if someone uses a fancy DOS and/or has lots of device drivers loaded, MEMLO could exceed $2800, which would cause your program to crash when loaded. - Also, the start address has to start on a page boundary ($xx00). Since it gets relocated to another page boundary, this means JMP (indirect) is safe to use: if the operand doesn't cross a page boundary, it still won't after it's relocated. - The original Wilkinson scheme was done entirely in Atari BASIC. I use a C program to create the relocation tables and the relocator itself becomes part of the relocatable program, so BASIC is not required. The relocator-generator will run on either the Atari or on a modern POSIX system. - The Insight: Atari article mentions that OSS languages use a scheme like this to relocate themselves when loaded. The sources for the OSS languages that have been released have a BASIC XL program that generates the bitmaps. - If your code has really tight cycle-counted timing loops, the timing might get thrown off due to relocation causing a branch to cross a page boundary, when it was originally not supposed to. This kind of code generally only belongs in games and demos. Relocatable code is usually used for things like device drivers or programming utilities. Games "take over" the whole machine and don't have to care about MEMLO or other software needing free RAM. Format of the relocatable executable ------------------------------------ - Segment with the original code, at the original load address. This is a copy of the first segment of lo.xex, actually. - Segment with the relocator code (from reloc.xex) and relocation tables. - INITAD segment that runs the relocator code. Note that the original RUNAD and INITAD segments (if any) don't appear in the relocatable file as segments. Relocation tables start immediately after the last byte of the relocator. First table is 8 bytes (4 words): - Original load address - Original end address - Original run address (or 0 for none) - Original init address (or 0 for none) The next N bytes are the relocation bitmap table. See below. For the init address, if it's not zero, the relocator jumps to it (at its new location). As usual, when the init code is done, it exits with an RTS, which will hand control back to DOS. For the run address, if it's not zero, the relocator adjusts RUNAD, and DOS uses RUNAD as usual when the program's done loading. Again, an RTS returns to DOS. Example: *=$4000 start: jsr set_color ; $4000 JSR $4007 jsr set_cursor ; $4003 JSR $400E rts ; $4006 set_color: lda bgcolor ; $4007 LDA $4015 sta COLOR2 ; $400A rts ; $400D set_cursor: lda cursor ; $400E LDA $4016 sta CRSINH ; $4011 rts ; $4014 bgcolor: .byte $00 ; $4015 cursor: .byte $01 ; $4016 *=INITAD .word start The address table for the above program: $00 40 - code_start $16 40 - code_end $00 00 - code_run (no run address) $00 40 - code_init Relocation bitmap table, in binary: table byte: addresses: 00100100 $4000 to $4007 01000000 $4008 to $400F 10000000 $4010 to $4017 The bits are read left to right. The first 1 bit is for address $4002, which is the high byte of the JMP operand. The last byte of the table actually extends past the end of the program. Extra bits in the last byte are set to 0. The bitmap table is always 1/8 the size of the code, rounded up to the next byte. It might be possible someday to save space by letting the table end early, if e.g. the last part of the program is fully relocatable code (or data). Currently this isn't done, and I'm not sure it's worth the extra complexity to implement. Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator will move the program to $1D00 - $1D16 and set MEMLO to $1D17. The operand of the first instruction (was JSR $4007) will be altered to $1D07 (aka $4007 - $4000 + $1D00), which is the address that the subroutine got relocated to. The original program assembled to a 32-byte file. The relocatable version will be around 400 bytes: 28 bytes for the original file (minus its INITAD segment), ~300 bytes for the relocator code, 8 bytes for the address table, and 10 bytes for the relocation table. However, the relocator and tables are only used once, and can be overwritten afterwards (so they count as free memory). Relocation Table Format ----------------------- Bitmap. One bit per byte in the file, read from high bit to low. 1 if the address needs adjusting, 0 if not. The relocator is 256 bytes long or less. The GR.0 display list with a 16K cart in is at $7C20. We want to end the bitmap at $7C00. Bitmap table will always be 1/8 the code size. If your code is 18880 bytes, the bitmap size is 2360 bytes. Supposing you ORG at $2800: code - $2800 to $71BF relocator - $71C0 to $72BF 8-byte table: $72C0 to $72C7 bitmap - $72C8 to $7C00 18880 bytes is the maximum size. Actually, the relocator is only 183 bytes, and the table could extend to $7C1F without overwriting the display list.