1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
|
How to do a self-relocating Atari 8-bit executable...
This is a modified form of a technique I saw in Bill Wilkinson's
Insight: Atari column in Compute! magazine (Issue 21, Feb 1982).
In the original scheme, you'd assemble the code twice, with the origin
(start address) one page apart. Say, assemble at address $4000, then
the 2nd time at $4100. Now, any bytes in the two object files that
differ by 1, are what needs to be changed when relocating. Suppose you
want to relocate to $2000, you just subtract $20 from all the bytes in
the first file that are 1 less than the same byte in the 2nd file.
This works, and is simple enough. The limitation is, you can only
relocate to an even page boundary. If you want to relocate to the
bottom of memory (pointed to by MEMLO), you probably will waste a few
bytes. In DOS 2.0S, I get $1CFC in MEMLO. Relocating to an even page
boundary means the goes goes at $1D00, and the 4 bytes from $1CFC
to $1D00 are wasted. That's not so bad... but if I enable another
drive in DOS, that bumps MEMLO up by 128 bytes, to $1D7C. Then my
relocatable code ends up at $1E00, and I waste 132 bytes below that...
In the modified form presented here, the code is still assembled
twice, but the 2nd pass is ORG'ed 258 ($0102) bytes higher than
the first. Now we have bytes that differ by one (the high bytes of
addresses) and others that differ by two (the low bytes).
Another, more serious limitation of the code from Insight: Atari is
that it doesn't produce self-relocating executables. What it produces
is BASIC programs that have the relocatable object code as DATA
statements, POKEd into memory when run. The relocator presented here
gets appended to your standard executable and relocates it "on the
fly", then jumps to the start of the relocated code.
Example: a subroutine call to within our own code:
JSR print_banner
This is the first instruction in our program, so it will be found
at $4000 for the first assembly pass, and $4102 for the second.
Say print_banner ends up at $4123 when we assemble at $4000, and $4225
when assembling at $4102. Further, we determine MEMLO has $1D80. So,
when we relocate the program, it ends up at $1D80. The target of the
JSR instruction has to be adjusted to match the new location where
print_banner is going to be.
The code that does the relocation, we'll call the relocator. The term
"relocating loader" is used elsewhere, but it's not accurate here: DOS
is the loader, and we're not replacing it.
The relocator is a small routine that gets appended to the first
executable (the $4000 one) as a segment, plus two data tables (one
each for low and high bytes), as another 2 segments, plus an INITAD
segment that runs the relocator code. These all have to load at a
fixed address, but once they're finished running, they won't be needed
again.
The relocator has to know the load address and the length of the main
segment of the program (the part it's going to relocate). What it
does:
1. Subtract the load address ($4000 in the example) from the contents
of MEMLO. This gives us a negative number (we hope!) that is the
amount each address in the program should have added to it.
2. Iterate over the two data tables, adding the offset. Each table entry
is the two-byte address of a byte that needs to be changed (an
absolute address that's "baked" into the program). The high and low
bytes of the addresses in the code are handled separately (hence
the two tables). The low byte of the offset is added to the bytes
at the addresses in the low-byte table, and the high byte of the
offset for the high-byte table.
3. Moves the main segment to MEMLO.
4. Set MEMLO to point to the byte after the end of the program
to protect it from being overwritten by e.g. BASIC or ASM/ED.
5. Add the offset to the contents of RUNAD, which is the run address
of the program, and then do an RTS to hand control back to DOS.
DOS will run the relocated code by jumping to the altered RUNAD.
Notes:
- To keep things simple, the program must consist of a single
segment of code and data, followed by an init address and/or an run
address.
- If your program is a device driver or a "TSR", you should use an
init address, NOT a run address. This allows users to append your
program to e.g. an RS-232 driver, and maybe a RAMdisk driver too,
etc. Each driver should have an init address, because Atari
executables can have multiple init addresses.
- If your program is an application, it's usually better to use a run
address. If you use an init address, your program will run, but DOS
will still be "in the middle of" loading the executable, meaning
IOCB #1 will still be open for reading.
- The program's end address must be below $6C00, since that's where
the relocator and tables load. The reason for this restriction
is to allow the relocatable executable to work with a 16K cartridge.
The lowest sane start address for the program is probably $2000,
which allows the program to be 19KB in size... though $3000 is
a lot safer (15KB max).
- Whatever start address (ORG) you use for the program, it has to
be higher than the current MEMLO when the relocation is done.
That's why I said $3000 is safer than $2000: if someone uses a fancy
DOS and/or have lots of device drivers loaded, MEMLO could exceed
$2000, which would cause your program to crash when loaded.
- The data tables' combined size must not exceed 4K. Generally the
tables will be the same size, and each entry is 2 bytes, so this
means you can't have more than about 1000 absolute references in
your code. This doesn't count references that point outside your
code, like e.g. JMP CIOV or STA CRSINH; these won't be relocated,
or your program wouldn't work. As a reference, the 8K Atari BASIC
cartridge would require 1522 bytes of data tables, if we were trying
to relocate it.
- The original Wilkinson scheme was done entirely in Atari BASIC. I
use a C program to create the relocation tables, and the relocator
itself becomes part of the relocatable program, so BASIC is not
required. The C program can be run on either the Atari or on
a modern POSIX system, which is especially useful if you use a
cross-assembler to write and assemble your Atari code.
- Indirect JMP instructions should always be used with care on the
6502. The two operand bytes have to be in the same page, due to a
6502 bug. Most 6502 asm programmers know how to handle this... but
with dynamically relocatable code, there's not really a good way to
do it. Best to avoid indirect JMPs. One simple workaround is to use
self-modifying code: Have an absolute JMP instruction in your code,
and store the indirect jump's destination there. Example:
JMP (VECTOR)
...becomes:
LDA VECTOR
STA TRAMPOLINE+1
LDA VECTOR+1
STA TRAMPOLINE+2
JMP TRAMPOLINE
; somewhere in the code you have this:
TRAMPOLINE JMP $0000
Another way to do it would be to use call-by-RTS (push the jump
address minus one on the stack, then execute RTS).
- If your code has really tight cycle-counted timing loops, the timing
might get thrown off due to relocation causing a branch to cross a
page boundary, when it was originally not supposed to. This kind of
code generally only belongs in games and demos. Relocatable code is
usually used for things like device drivers or programming utilities.
Games "take over" the whole machine and don't have to care about MEMLO
or other software needing free RAM.
Format of the relocatable executable:
- Segment with the original code, at the original load address.
- Segment with the relocator code and relocation tables.
- INITAD segment that runs the relocator code.
Note that the original RUNAD and INITAD segments (if any) don't appear
in the relocatable file as segments.
Relocation tables start immediately after the last byte of the relocator.
First 8 bytes are 4 words:
- Original load address
- Original end address
- Original run address (or 0 for none)
- Original init address (or 0 for none)
The next N bytes are the high-byte relocation table. Each entry
is a word, the address of a byte within the program that has to be
relocated. The table ends with $0000.
The next N bytes are the low-byte table, same format as the high-byte
table including the $0000 at the end. The high and low byte tables
will generally be the same size, but this is not a requirement.
For the init address, if it's not zero, the relocator JSR's to it (at its
new location).
For the run address, if it's not zero, the relocator adjusts RUNAD,
and DOS uses RUNAD as usual when the program's done loading.
Example:
*=$4000
start:
jsr set_color ; $4000 JSR $4007
jsr set_cursor ; $4003 JSR $400E
rts ; $4006
set_color:
lda bgcolor ; $4007 LDA $4015
sta COLOR2 ; $400A
rts ; $400D
set_cursor:
lda cursor ; $400E LDA $4016
sta CRSINH ; $4011
rts ; $4014
bgcolor: .byte $00 ; $4015
cursor: .byte $01 ; $4016
*=INITAD
.word start
The address table for the above program:
$00 40 - code_start
$16 40 - code_end
$00 00 - code_run (no run address)
$00 40 - code_init
High byte relocation table:
$02 $40 ; hi byte of JSR $4007 operand
$05 $40 ; hi byte of JSR $400E operand
$09 $40 ; hi byte of LDA $4015 operand
$10 $40 ; hi byte of LDA $4016
$00 $00 ; terminator
Low byte relocation table:
$01 $40 ; lo byte of JSR $4007 operand
$04 $40 ; lo byte of JSR $400E operand
$08 $40 ; lo byte of LDA $4015 operand
$0F $40 ; lo byte of LDA $4016
$00 $00 ; terminator
Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator
will move the program to $1CFC - $1D12 and set MEMLO to $1D13. The
operand of the first instruction (was JSR $4007) will be altered
to $1D03 (aka $4007 - $4000 + $1CFC), which is the address that the
subroutine got relocated to.
The original program assembled to a 32-byte file. The relocatable
version will be around 200 bytes: 28 bytes for the original file
(minus its INITAD segment), ~128 bytes for the relocator code, 8 bytes
for the address table, and 20 bytes for the two relocation tables.
However, the relocator and tables are only used once, and can be
overwritten afterwards (so they count as free memory).
|