1 files changed, 89 insertions, 57 deletions
diff --git a/README.txt b/README.txt
index cb72884..493119a 100644
--- a/README.txt
+++ b/README.txt
@@ -4,17 +4,21 @@ Atari 8-Bit Self Relocator
 
 This is a modified form of a technique I saw in Bill Wilkinson's
 Insight: Atari column in Compute! magazine (Issue 21, Feb 1982).
+It creates Atari executables that relocate themselves to just
+above MEMLO.
 
 To build the relocator and run the demo, you'll need:
 
 - cc65 from https://cc65.github.io/
 - axe from https://slackware.uk/~urchlay/repos/bw-atari8-tools
 
-...as well as standard Linux packages like make and perl.
+...as well as standard Linux packages like make and gcc.
 
-To build, just type "make". The result is "reloc.atr", which is
-an Atari disk image with DOS 2.0S and the relocatable program as
+To build the demo, just type "make". The result is "reloc.atr", which
+is an Atari disk image with DOS 2.0S and the relocatable program as
 AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run.
+There's also "reloc25.atr", which is the same thing except it's DOS
+2.5 (with MEMLO a bit higher).
 
 The demo shows "Hello World" with changing colors, along with its own
 load address, end address, and the current MEMLO. The important part
@@ -22,6 +26,54 @@ is that it got relocated to MEMLO and run from there. The code isn't
 relocatable (see the source, "hello.s"). The relocator adjusted all
 the absolute addresses on the fly, at load time.
 
+There's also a "native.atr", which is a DOS 2.0S bootable disk with
+the relocator compiled for the Atari, as MKRELOC.XEX. This will load
+with DOS's L command, and will read LO.XEX and HI.XEX (which are
+non-relocatable) and create a relocatable AUTORUN.SYS. Reboot
+to see the demo run.
+
+Usage
+-----
+
+To create relocating executables of your own software, you can
+use either a modern system running a 6502 cross-assembler (atasm,
+xa65, ca65, dasm, etc) or an Atari 8-bit.
+
+First, write your code. There are some limitations:
+
+- All your code and data must be in a single segment. Generally
+  this means, only set the origin once, and don't use *= or .org
+  again until the end (for RUNAD and/or INITAD).
+- Your code's origin (start address) must begin on an even
+  page boundary, $2800 or higher.
+- You can use only one init address.
+
+Once your code is written and tested:
+
+- Assemble the code at a start address of $2800 or higher, as a
+  regular Atari executable (.xex/.com/.bin file). The executable must
+  be called "lo.xex" if you're cross-assembling, or "D:LO.XEX" if
+  you're using the Atari.
+- Change the start address (*= or .org direcive) so that it's
+  one page higher. If you used $2800, you'd change it to $2900.
+- Assemble the code again, to an executable called "hi.xex",
+  or "D:HI.XEX" on the Atari.
+- Make sure you have the reloc.xex (D:RELOC.XEX) and mkreloc
+  (D:MKRELOC.XEX) files in the same directory (or on the same disk).
+- Run the relocator. On a modern system the command will be
+  "./mkreloc" (or possibly just "mkreloc" if you installed it
+  somewhere on your $PATH). On the Atari, load D:MKRELOC.XEX
+  from the DOS menu.
+- If you're using an Atari, wait a bit. Listen to the disk I/O
+  beeps... when it's finished, you'll be back at the DOS menu.
+  You will have a brand-new AUTORUN.SYS, which is the self
+  relocating version of your program. You can reboot to run it.
+- If you're on a modern system, you'll have a (lowercase)
+  autorun.sys, which you can copy to a DOS disk image. You
+  can also test-run it by directly loading it with an
+  emulator (e.g. "atari800 autorun.sys"), if it can run
+  without DOS.
+
 How it works
 ------------
 
@@ -41,20 +93,16 @@ The code from Insight: Atari is doesn't produce self-relocating
 executables. What it produces is BASIC programs that have the
 relocatable object code as DATA statements, POKEd into memory when
 run. The relocator presented here gets appended to your standard
-executable and relocates it "on the fly", then jumps to the start of
-the relocated code.
-
-The Insight: Atari article mentions that OSS languages use a scheme
-like this to relocate themselves when loaded. The sources for the
-OSS languages that have been released have a BASIC XL program that
-generates the bitmaps.
+executable and relocates it "on the fly", then jumps to the
+(relocated) run and/or init address of the relocated code.
 
 Example: a subroutine call to within our own code:
 
  JSR print_banner
 
-This is the first instruction in our program, so it will be found
-at $4000 for the first assembly pass, and $4100 for the second.
+This is the first instruction in our program. Say we assemble at
+$4000, so it will be found at $4000 for the first assembly pass, and
+$4100 for the second.
 
 Say print_banner ends up at $4123 when we assemble at $4000, and $4223
 when assembling at $4100. Further, we determine MEMLO has $1D80. So,
@@ -78,9 +126,9 @@ The tables are followed by an INITAD segment that runs the relocator
 code. The relocator and the tables have to load at a fixed address,
 but once they're finished running, they won't be needed again.
 
-The relocator has to know the load address and the length of
-the "payload" segment of the program (the part it's going to
-relocate). What it does:
+The relocator has to know the load address and the length of the
+"payload" segment of the program (the part it's going to relocate). At
+load time, it gets run via INITAD. What it does:
 
 1. Subtract the high byte of MEMLO from the high byte of the load address
    ($4000 in the example), then add 1. This gives us a positive number
@@ -105,10 +153,6 @@ relocate). What it does:
 
 Notes:
 
-- To keep things simple, the program must consist of a single
-  segment of code and data, followed by an init address and/or an run
-  address.
-
 - If your program is a device driver or a "TSR", you should use an
   init address, NOT a run address. This allows users to append your
   program to e.g. an RS-232 driver, and maybe a RAMdisk driver too,
@@ -124,46 +168,30 @@ Notes:
   the relocator and tables load. The reason for this restriction
   is to allow the relocatable executable to work with a 16K cartridge.
   The lowest sane start address for the program is probably $2800,
-  which allows the program to be 17KB in size... though $3000 is
-  a lot safer (15KB max).
+  which allows the program to be 18.5KB in size... though $3000 is
+  a lot safer (16.5KB max).
 
 - Whatever start address (ORG) you use for the program, it has to
   be higher than the current MEMLO when the relocation is done.
-  That's why I said $3000 is safer than $2000: if someone uses a fancy
-  DOS and/or have lots of device drivers loaded, MEMLO could exceed
-  $2000, which would cause your program to crash when loaded.
+  That's why I said $3000 is safer than $2800: if someone uses a fancy
+  DOS and/or has lots of device drivers loaded, MEMLO could exceed
+  $2800, which would cause your program to crash when loaded.
 
 - Also, the start address has to start on a page boundary ($xx00).
+  Since it gets relocated to another page boundary, this means
+  JMP (indirect) is safe to use: if the operand doesn't cross a
+  page boundary, it still won't after it's relocated.
 
 - The original Wilkinson scheme was done entirely in Atari BASIC.
-  I use a perl script to create the relocation tables and the
+  I use a C program to create the relocation tables and the
   relocator itself becomes part of the relocatable program, so BASIC
-  is not required. The perl script will be rewritten in C at some
-  point, and the the C program will run on either the Atari or on
-  a modern POSIX system.
-
-- Indirect JMP instructions should always be used with care on the
-  6502. The two operand bytes have to be in the same page, due to a
-  6502 bug. Most 6502 asm programmers know how to handle this... but
-  with dynamically relocatable code, there's not really a good way to
-  do it. Best to avoid indirect JMPs. One simple workaround is to use
-  self-modifying code: Have an absolute JMP instruction in your code,
-  and store the indirect jump's destination there. Example:
-
- JMP (VECTOR)
-
-...becomes:
-
- LDA VECTOR
- STA TRAMPOLINE+1
- LDA VECTOR+1
- STA TRAMPOLINE+2
- JMP TRAMPOLINE
- ; somewhere in the code you have this:
-TRAMPOLINE JMP $0000
+  is not required. The relocator-generator will run on either the
+  Atari or on a modern POSIX system.
 
-  Another way to do it would be to use call-by-RTS (push the jump
-  address minus one on the stack, then execute RTS).
+- The Insight: Atari article mentions that OSS languages use a scheme
+  like this to relocate themselves when loaded. The sources for the
+  OSS languages that have been released have a BASIC XL program that
+  generates the bitmaps.
 
 - If your code has really tight cycle-counted timing loops, the timing
   might get thrown off due to relocation causing a branch to cross a
@@ -173,10 +201,12 @@ TRAMPOLINE JMP $0000
   Games "take over" the whole machine and don't have to care about MEMLO
   or other software needing free RAM.
 
-Format of the relocatable executable:
+Format of the relocatable executable
+------------------------------------
 
-- Segment with the original code, at the original load address.
-- Segment with the relocator code and relocation tables.
+- Segment with the original code, at the original load address. This is
+  a copy of the first segment of lo.xex, actually.
+- Segment with the relocator code (from reloc.xex) and relocation tables.
 - INITAD segment that runs the relocator code.
 
 Note that the original RUNAD and INITAD segments (if any) don't appear
@@ -192,11 +222,13 @@ First table is 8 bytes (4 words):
 
 The next N bytes are the relocation bitmap table. See below.
 
-For the init address, if it's not zero, the relocator JSR's to it (at its
-new location).
+For the init address, if it's not zero, the relocator jumps to it (at
+its new location). As usual, when the init code is done, it exits with
+an RTS, which will hand control back to DOS.
 
 For the run address, if it's not zero, the relocator adjusts RUNAD,
-and DOS uses RUNAD as usual when the program's done loading.
+and DOS uses RUNAD as usual when the program's done loading. Again,
+an RTS returns to DOS.
 
 Example:
 
@@ -276,6 +308,6 @@ relocator - $71C0 to $72BF
 8-byte table: $72C0 to $72C7
 bitmap - $72C8 to $7C00
 
-18880 bytes is the maximum size. Actually, the relocator is only 185
+18880 bytes is the maximum size. Actually, the relocator is only 183
 bytes, and the table could extend to $7C1F without overwriting the
 display list.