1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
|
Atari 8-Bit Self Relocator
--------------------------
This is a modified form of a technique I saw in Bill Wilkinson's
Insight: Atari column in Compute! magazine (Issue 21, Feb 1982).
To build the relocator and run the demo, you'll need:
- cc65 from https://cc65.github.io/
- axe from https://slackware.uk/~urchlay/repos/bw-atari8-tools
...as well as standard Linux packages like make and perl.
To build, just type "make". The result is "reloc.atr", which is
an Atari disk image with DOS 2.0S and the relocatable program as
AUTORUN.SYS. Boot the disk on an Atari or emulator to see it run.
The demo shows "Hello World" with changing colors, along with its own
load address, end address, and the current MEMLO. The important part
is that it got relocated to MEMLO and run from there. The code isn't
relocatable (see the souce, "hello.s"). The relocator adjusted all the
absolute addresses on the fly (at load time).
How it works
------------
You assemble the code twice. The 2nd time around, you set the origin
one page higher than the first. You have two executables that are
identical except for the high bytes of absolute addresses within the
code (which differ by one). Based on this information, the relocator
can move the code to just above MEMLO and adjust all the addresses so
it'll actually run in its new location.
Unfortunately, the code can only be relocated by multiples of 256
bytes. The low bytes aren't adjusted. So unless MEMLO happens to
contain $FF in its low byte, some memory will be wasted (up to 255
bytes).
The code from Insight: Atari is doesn't produce self-relocating
executables. What it produces is BASIC programs that have the
relocatable object code as DATA statements, POKEd into memory when
run. The relocator presented here gets appended to your standard
executable and relocates it "on the fly", then jumps to the start of
the relocated code.
Example: a subroutine call to within our own code:
JSR print_banner
This is the first instruction in our program, so it will be found
at $4000 for the first assembly pass, and $4100 for the second.
Say print_banner ends up at $4123 when we assemble at $4000, and $4223
when assembling at $4100. Further, we determine MEMLO has $1D80. So,
when we relocate the program, it ends up at $1E00 (the start of the
next page). The target of the JSR instruction has to be adjusted
to match the new location where print_banner is going to be. After
relocation, the JSR $4123 reads JSR $1E23.
The code that does the relocation, we'll call the relocator. The term
"relocating loader" is used elsewhere, but it's not accurate here: DOS
is the loader, and we're not replacing it.
The relocator is a small routine that gets appended to the first
executable (the $4000 one) as a segment, plus two data tables (one for
the original ORG, code length, init, and run addresses, the other with
the addresses that need adjusting), plus an INITAD segment that runs
the relocator code. These all have to load at a fixed address, but
once they're finished running, they won't be needed again.
The relocator has to know the load address and the length of the main
segment of the program (the part it's going to relocate). What it
does:
1. Subtract the high byte of MEMLO from the high byte of the load address
($4000 in the example), then add 1. This gives us a positive number
(we hope!) that is the amount each address's high byte in the
program should have subtracted from it.
2. Iterate over the relocation data table, subtracting the
offset. Each table entry is the two-byte address of a byte that
needs to be changed (an absolute address that's "baked" into the
program).
3. Move the main segment to the start of the first page above MEMLO.
4. Set MEMLO to point to the byte after the end of the program
to protect it from being overwritten by e.g. BASIC or ASM/ED.
5. If the program has an init address, subtract the offset from it,
then jump to it. This runs the payload program's init routine.
5. If the program has a run address, subtract the offset from it,
storing the result in RUNAD. Then do an RTS to hand control back
to DOS. DOS will run the relocated code by jumping to the altered
RUNAD, in the usual way.
Notes:
- To keep things simple, the program must consist of a single
segment of code and data, followed by an init address and/or an run
address.
- If your program is a device driver or a "TSR", you should use an
init address, NOT a run address. This allows users to append your
program to e.g. an RS-232 driver, and maybe a RAMdisk driver too,
etc. Each driver should have an init address, because Atari
executables can have multiple init addresses.
- If your program is an application, it's usually better to use a run
address. If you use an init address, your program will run, but DOS
will still be "in the middle of" loading the executable, meaning
IOCB #1 will still be open for reading.
- The program's end address must be below $6C00, since that's where
the relocator and tables load. The reason for this restriction
is to allow the relocatable executable to work with a 16K cartridge.
The lowest sane start address for the program is probably $2000,
which allows the program to be 19KB in size... though $3000 is
a lot safer (15KB max).
- Whatever start address (ORG) you use for the program, it has to
be higher than the current MEMLO when the relocation is done.
That's why I said $3000 is safer than $2000: if someone uses a fancy
DOS and/or have lots of device drivers loaded, MEMLO could exceed
$2000, which would cause your program to crash when loaded.
- Also, the start address has to start on a page boundary ($xx00).
- The data table size must not exceed 4K. The table is compressed; see
"Relocation Table Format", below.
- The original Wilkinson scheme was done entirely in Atari BASIC.
I use a perl script to create the relocation tables and the
relocator itself becomes part of the relocatable program, so BASIC
is not required. The perl script will be rewritten in C at some
point, and the the C program will run on either the Atari or on
a modern POSIX system.
- Indirect JMP instructions should always be used with care on the
6502. The two operand bytes have to be in the same page, due to a
6502 bug. Most 6502 asm programmers know how to handle this... but
with dynamically relocatable code, there's not really a good way to
do it. Best to avoid indirect JMPs. One simple workaround is to use
self-modifying code: Have an absolute JMP instruction in your code,
and store the indirect jump's destination there. Example:
JMP (VECTOR)
...becomes:
LDA VECTOR
STA TRAMPOLINE+1
LDA VECTOR+1
STA TRAMPOLINE+2
JMP TRAMPOLINE
; somewhere in the code you have this:
TRAMPOLINE JMP $0000
Another way to do it would be to use call-by-RTS (push the jump
address minus one on the stack, then execute RTS).
- If your code has really tight cycle-counted timing loops, the timing
might get thrown off due to relocation causing a branch to cross a
page boundary, when it was originally not supposed to. This kind of
code generally only belongs in games and demos. Relocatable code is
usually used for things like device drivers or programming utilities.
Games "take over" the whole machine and don't have to care about MEMLO
or other software needing free RAM.
Format of the relocatable executable:
- Segment with the original code, at the original load address.
- Segment with the relocator code and relocation tables.
- INITAD segment that runs the relocator code.
Note that the original RUNAD and INITAD segments (if any) don't appear
in the relocatable file as segments.
Relocation tables start immediately after the last byte of the relocator.
First table is 8 bytes (4 words):
- Original load address
- Original end address
- Original run address (or 0 for none)
- Original init address (or 0 for none)
The next N bytes are the high-byte relocation table. See below.
For the init address, if it's not zero, the relocator JSR's to it (at its
new location).
For the run address, if it's not zero, the relocator adjusts RUNAD,
and DOS uses RUNAD as usual when the program's done loading.
Example:
*=$4000
start:
jsr set_color ; $4000 JSR $4007
jsr set_cursor ; $4003 JSR $400E
rts ; $4006
set_color:
lda bgcolor ; $4007 LDA $4015
sta COLOR2 ; $400A
rts ; $400D
set_cursor:
lda cursor ; $400E LDA $4016
sta CRSINH ; $4011
rts ; $4014
bgcolor: .byte $00 ; $4015
cursor: .byte $01 ; $4016
*=INITAD
.word start
The address table for the above program:
$00 40 - code_start
$16 40 - code_end
$00 00 - code_run (no run address)
$00 40 - code_init
High byte relocation table:
$02 $40 ; hi byte of JSR $4007 operand
$05 $40 ; hi byte of JSR $400E operand
$09 $40 ; hi byte of LDA $4015 operand
$10 $40 ; hi byte of LDA $4016
$00 $00 ; terminator
Program loads from $4000 to $4016. If MEMLO was $1CFC, the relocator
will move the program to $1D00 - $1D16 and set MEMLO to $1D17. The
operand of the first instruction (was JSR $4007) will be altered
to $1D07 (aka $4007 - $4000 + $1CFC), which is the address that the
subroutine got relocated to.
The original program assembled to a 32-byte file. The relocatable
version will be around 400 bytes: 28 bytes for the original file
(minus its INITAD segment), ~300 bytes for the relocator code, 8
bytes for the address table, and 10 bytes for the relocation table.
However, the relocator and tables are only used once, and can be
overwritten afterwards (so they count as free memory).
Relocation Table Format
Current implementation:
A list of addresses that need to be adjusted (high bytes of absolute
addresses), 2 bytes each, terminated with $00 $00.
Possible future implementation:
Bitmap. One bit per byte in the file. 1 if the address needs
adjusting, 0 if not. This *probably* will actually be smaller than
the list of addresses. Also has the advantage of being a fixed size,
easily calculated/predicted.
The relocator is 256 bytes long or less.
The GR.0 display list with a 16K cart in is at $7C20.
We want to end the bitmap at $7C00.
Bitmap table will always be 1/8 the code size.
If your code is 18880 bytes, the bitmap size is 2360 bytes.
Supposing you ORG at $2800:
code - $2800 to $71BF
relocator - $71C0 to $71CF
8-byte table: $71D0 to $71D7
bitmap - $71D8 to $7B10
|