README.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246


How to do a self-relocating Atari 8-bit executable...

This is a modified form of a technique I saw in Bill Wilkinson's
Insight: Atari column in Compute! magazine (Issue 21, Feb 1982).

In the original scheme, you'd assemble the code twice, with the origin
(start address) one page apart. Say, assemble at address $4000, then
the 2nd time at $4100. Now, any bytes in the two object files that
differ by 1, are what needs to be changed when relocating. Suppose you
want to relocate to $2000, you just subtract $20 from all the bytes in
the first file that are 1 less than the same byte in the 2nd file.

This works, and is simple enough. The limitation is, you can only
relocate to an even page boundary. If you want to relocate to the
bottom of memory (pointed to by MEMLO), you probably will waste a few
bytes. In DOS 2.0S, I get $1CFC in MEMLO. Relocating to an even page
boundary means the goes goes at $1D00, and the 4 bytes from $1CFC
to $1D00 are wasted. That's not so bad... but if I enable another
drive in DOS, that bumps MEMLO up by 128 bytes, to $1D7C. Then my
relocatable code ends up at $1E00, and I waste 132 bytes below that...

In the modified form presented here, the code is still assembled
twice, but the 2nd pass is ORG'ed 258 ($0102) bytes higher than
the first. Now we have bytes that differ by one (the high bytes of
addresses) and others that differ by two (the low bytes).

Another, more serious limitation of the code from Insight: Atari is
that it doesn't produce self-relocating executables. What it produces
is BASIC programs that have the relocatable object code as DATA
statements, POKEd into memory when run. The relocator presented here
gets appended to your standard executable and relocates it "on the
fly", then jumps to the start of the relocated code.

Example: a subroutine call to within our own code:

 JSR print_banner

This is the first instruction in our program, so it will be found
at $4000 for the first assembly pass, and $4102 for the second.

Say print_banner ends up at $4123 when we assemble at $4000, and $4225
when assembling at $4102. Further, we determine MEMLO has $1D80. So,
when we relocate the program, it ends up at $1D80. The target of the
JSR instruction has to be adjusted to match the new location where
print_banner is going to be.

The code that does the relocation, we'll call the relocator. The term
"relocating loader" is used elsewhere, but it's not accurate here: DOS
is the loader, and we're not replacing it.

The relocator is a small routine that gets appended to the first
executable (the $4000 one) as a segment, plus two data tables (one
each for low and high bytes), as another 2 segments, plus an INITAD
segment that runs the relocator code. These all have to load at a
fixed address, but once they're finished running, they won't be needed
again.

The relocator has to know the load address and the length of the main
segment of the program (the part it's going to relocate). What it
does:

1. Subtract the load address ($4000 in the example) from the contents
   of MEMLO. This gives us a negative number (we hope!) that is the
   amount each address in the program should have added to it.

2. Iterate over the two data tables, adding the offset. Each table entry
   is the two-byte address of a byte that needs to be changed (an
   absolute address that's "baked" into the program). The high and low
   bytes of the addresses in the code are handled separately (hence
   the two tables). The low byte of the offset is added to the bytes
   at the addresses in the low-byte table, and the high byte of the
   offset for the high-byte table.

3. Moves the main segment to MEMLO.

4. Set MEMLO to point to the byte after the end of the program
   to protect it from being overwritten by e.g. BASIC or ASM/ED.

5. Add the offset to the contents of RUNAD, which is the run address
   of the program, and then do an RTS to hand control back to DOS.
   DOS will run the relocated code by jumping to the altered RUNAD.

Notes:

- To keep things simple, the program must consist of a single
  segment of code and data, followed by an init address and/or an run
  address.

- If your program is a device driver or a "TSR", you should use an
  init address, NOT a run address. This allows users to append your
  program to e.g. an RS-232 driver, and maybe a RAMdisk driver too,
  etc. Each driver should have an init address, because Atari
  executables can have multiple init addresses.

- If your program is an application, it's usually better to use a run
  address. If you use an init address, your program will run, but DOS
  will still be "in the middle of" loading the executable, meaning
  IOCB #1 will still be open for reading.

- The program's end address must be below $6C00, since that's where
  the relocator and tables load. The reason for this restriction
  is to allow the relocatable executable to work with a 16K cartridge.
  The lowest sane start address for the program is probably $2000,
  which allows the program to be 19KB in size... though $3000 is
  a lot safer (15KB max).

- Whatever start address (ORG) you use for the program, it has to
  be higher than the current MEMLO when the relocation is done.
  That's why I said $3000 is safer than $2000: if someone uses a fancy
  DOS and/or have lots of device drivers loaded, MEMLO could exceed
  $2000, which would cause your program to crash when loaded.

- The data tables' combined size must not exceed 4K. Generally the
  tables will be the same size, and each entry is 2 bytes, so this
  means you can't have more than about 1000 absolute references in
  your code. This doesn't count references that point outside your
  code, like e.g. JMP CIOV or STA CRSINH; these won't be relocated,
  or your program wouldn't work. As a reference, the 8K Atari BASIC
  cartridge would require 1522 bytes of data tables, if we were trying
  to relocate it.

- The original Wilkinson scheme was done entirely in Atari BASIC. I
  use a C program to create the relocation tables, and the relocator
  itself becomes part of the relocatable program, so BASIC is not
  required. The C program can be run on either the Atari or on
  a modern POSIX system, which is especially useful if you use a
  cross-assembler to write and assemble your Atari code.

- Indirect JMP instructions should always be used with care on the
  6502. The two operand bytes have to be in the same page, due to a
  6502 bug. Most 6502 asm programmers know how to handle this... but
  with dynamically relocatable code, there's not really a good way to
  do it. Best to avoid indirect JMPs. One simple workaround is to use
  self-modifying code: Have an absolute JMP instruction in your code,
  and store the indirect jump's destination there. Example:

 JMP (VECTOR)

...becomes:

 LDA VECTOR
 STA TRAMPOLINE+1
 LDA VECTOR+1
 STA TRAMPOLINE+2
 JMP TRAMPOLINE
 ; somewhere in the code you have this:
TRAMPOLINE JMP $0000

  Another way to do it would be to use call-by-RTS (push the jump
  address minus one on the stack, then execute RTS).

- If your code has really tight cycle-counted timing loops, the timing
  might get thrown off due to relocation causing a branch to cross a
  page boundary, when it was originally not supposed to. This kind of
  code generally only belongs in games and demos. Relocatable code is
  usually used for things like device drivers or programming utilities.
  Games "take over" the whole machine and don't have to care about MEMLO
  or other software needing free RAM.

Format of the relocatable executable:

- Segment with the original code, at the original load address.
- Segment with the relocator code and relocation tables.
- INITAD segment that runs the relocator code.

Note that the original RUNAD and INITAD segments (if any) don't appear
in the relocatable file as segments.

Relocation tables start immediately after the last byte of the relocator.

First 8 bytes are 4 words:
- Original load address
- Original end address
- Original run address (or 0 for none)
- Original init address (or 0 for none)

The next N bytes are the high-byte relocation table. Each entry
is a word, the address of a byte within the program that has to be
relocated. The table ends with $0000.

The next N bytes are the low-byte table, same format as the high-byte
table including the $0000 at the end. The high and low byte tables
will generally be the same size, but this is not a requirement.

For the init address, if it's not zero, the relocator JSR's to it (at its
new location).

For the run address, if it's not zero, the relocator adjusts RUNAD,
and DOS uses RUNAD as usual when the program's done loading.

Example:

 *=$4000
start:
 jsr set_color    ; $4000 JSR $4007
 jsr set_cursor   ; $4003 JSR $400E
 rts              ; $4006
set_color:
 lda bgcolor       ; $4007 LDA $4015
 sta COLOR2        ; $400A
 rts               ; $400D
set_cursor:
 lda cursor        ; $400E LDA $4016
 sta CRSINH        ; $4011
 rts               ; $4014
bgcolor: .byte $00 ; $4015
cursor:  .byte $01 ; $4016
 *=INITAD
 .word start

The address table for the above program:

$00 40 - code_start
$16 40 - code_end
$00 00 - code_run (no run address)
$00 40 - code_init

High byte relocation table:

$02 $40 ; hi byte of JSR $4007 operand
$05 $40 ; hi byte of JSR $400E operand
$09 $40 ; hi byte of LDA $4015 operand
$10 $40 ; hi byte of LDA $4016
$00 $00 ; terminator

Low byte relocation table:

$01 $40 ; lo byte of JSR $4007 operand
$04 $40 ; lo byte of JSR $400E operand
$08 $40 ; lo byte of LDA $4015 operand
$0F $40 ; lo byte of LDA $4016
$00 $00 ; terminator

Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator
will move the program to $1CFC - $1D12 and set MEMLO to $1D13. The
operand of the first instruction (was JSR $4007) will be altered
to $1D03 (aka $4007 - $4000 + $1CFC), which is the address that the
subroutine got relocated to.

The original program assembled to a 32-byte file. The relocatable
version will be around 200 bytes: 28 bytes for the original file
(minus its INITAD segment), ~128 bytes for the relocator code, 8 bytes
for the address table, and 20 bytes for the two relocation tables.
However, the relocator and tables are only used once, and can be
overwritten afterwards (so they count as free memory).