aboutsummaryrefslogtreecommitdiff
path: root/AMSB.txt
blob: f277e26e7417cdf841d82baec36722c0962503bc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
Atari Microsoft BASIC Notes
---------------------------

AMSB is actually a pretty cool BASIC for the Atari 8-bit. I never
got the chance to use it 'back in the day' because it was expensive,
required a floppy drive and at least 32K of RAM (my poor 400 had a
tape drive for the first few years), and then later on, there was
Turbo BASIC XL, which was cooler than AMSB, and also freeware.

This file is a collection of notes I made to myself while developing
listamsb. The information here might be useful (e.g. if you're trying
to repair a damaged AMSB file) and hopefully is interesting. Enjoy!

This file is part of the bw-atari8-utils source. You can get the
latest version of the source from:

https://slackware.uk/~urchlay/repos/bw-atari8-tools

...which you can either view with a web browser or use with the 'git
clone' command.

                                -- B. Watson <urchlay@slackware.uk>


Tokenized file format
---------------------

File begins with a 3-byte header:

offset | purpose
-------+-------------------------------------------------------
   0   | 0 for a normal program, 1 for LOCKed (encrypted).
   1   | LSB, program length, not counting the 3-byte header...
   2   | MSB, "        "

The program length should always be the actual file size minus 3.  If
it's not, the file has either been truncated or had junk added to the
end. In a LOCKed program, the program length bytes are not encrypted.

After the header, the lines of code (encrypted, for LOCKed programs).
Each line has a 4-byte header:

offset | purpose
-------+-------------------------------------------------------
   0   | LSB, address of the last byte of this line...
   1   | MSB, address  ...which is ignored on LOAD!
   2   | LSB of line number
   3   | MSB "  "    "

The rest of the line is the tokens, terminated by a $00 byte. The
next 2 bytes after the $00 is the last-byte offset of the next line.

The last "line" of the program has a $0000 offset, which indicates the
end of the program. Since the actual last line ends with a $00, that
means there will be three $00 bytes in a row as the last 3 bytes of
the file. And that's the *only* place 3 $00's in a row will occur.

Tokenization is "lightweight": there are no tokenized numerics,
they're just stored as ASCII characters, as typed. There's no "string
constant follows" token like there is in Atari BASIC (well, there is,
it's just a double-quote, $22. There's no length byte). Variable names
are not tokenized, either, they're just stored as-is (name in ASCII,
including trailing $ for strings, etc). Numeric constants are just
stored as ASCII digits, just as you typed them.

In fact the only things that are tokenized are BASIC keywords:
commands and functions... NOT including user functions defined
with DEF (those are stored as just the ASCII function name, like
variables).

There are 2 sets of tokens. One set is single-byte, $80 and up.
These are commands. The other set is functions, which are 2 bytes:
$FF followed by the token number. See amsbtok.h in the source for the
actual tokens.

AMSB saves the end-of-line pointers, but it totally ignores them
on LOAD. The SAVEd file format does *not* have a load address (as e.g.
Commodore BASIC does), so there's no way to know the address of the
start of the program (other than counting backwards from the next line,
since its address is known). It's not just a constant either: it
depends on what MEMLO was set to when the program was saved (which varies
depending on what version of AMSB you have, what DOS you boot, whether
or not you have the R: device driver loaded, etc etc).


Redundant Tokens
----------------

There are two separate tokens each for PRINT and AT:

token | text
------+-----------------------
 $ab  | "PRINT "
 $ac  | "PRINT"
 $df  | "AT("
 $e0  | "AT "

When tokenizing a line, AMSB will actually use the $ab token if
there's a space after PRINT (or ?), otherwise it will use the
$ac token. These lines actually get tokenized differently:

10 PRINT "HELLO"
10 PRINT"HELLO"

Same applies to the $df and $e0 AT tokens: if the user entered
"AT(X,Y)", $df is used. Otherwise, with "AT (X,Y)", $e0 is used
(followed by an ASCII left parenthesis).

3 tokens include the opening parenthesis:

token | text
------+-----------------------
 $d2  | "TAB("
 $d6  | "SPC("
 $df  | "AT("

Normally in AMSB, it's OK to leave a space between a function name
and the left-paren. PEEK (123) and SIN (1) are both valid. However,
for SPC and TAB, no space is allowed, because the ( is part of the
token. AT would be the same way, except there's a separate token $e0
that *includes* the space. Weird, huh? A side effect of this is
that "SPC (10)" or "TAB (10)" won't be treated as a function call.
Instead, the SPC or TAB is treated as a variable name. If you write:

PRINT TAB (10);"HELLO"

...it'll print " 0 HELLO" at the start of the line[*], instead of "HELLO"
in the 10th column as you might have expected. It also means that AT,
TAB, and SPC are valid variable names in AMSB, which is an exception
to the rule that keywords can't be used as variable names (e.g. SIN=1
or STRING$="HELLO" are invalid).

[*] Unless you've assigned another value to TAB, of couse.


Unused Tokens
-------------

If you look at the token list in amsbtok.h (or in a hex dump
of the AMSB executable or cartridge image), you'll see a lot of
double-quotes mixed in with the list. AMSB doesn't actually tokenize
the " character (it's stored as $22, its ASCII value), so these seem
to be placeholders, either because some tokens were deleted from the
language during its development, or else they're intended for some
future version of AMSB that never happened.

The weird quote tokens are $99, $c8 to $d0, $d5, and $e7 to $ed. If
you hexedit a program to replace a regular double-quote with one of
these tokens, it will list as either "" or just one ", but it will
cause a syntax error at runtime.


LOADing Untokenized Files
-------------------------

If the first byte of the file is anything other than $00 or $01,
AMSB's LOAD command reads it in as a text file (LISTed rather than
SAVEd).

When LOAD is reading a text file, if the last byte of the file isn't
an ATASCII EOL ($9b), you'll get #136 ERROR. The program doesn't get
deleted, but the last line of the file didn't get loaded. This could
happen if a LISTed file somehow got truncated.

While on the subject... the manual doesn't mention it, but if you LOAD
a text file without line numbers, the code gets executed in direct
mode during the load (like Atari BASIC's ENTER command does). This
means you could write scripts (batch files) for AMSB... though you'd
be better off using MERGE, rather than LOAD (MERGE is basically the
same thing as Atari BASIC's ENTER).


Program Length Header Mismatch
------------------------------

When AMSB's LOAD command executes, it reads the 3-byte header, then
reads as many bytes as the header's program length says.

If the header length is longer than the rest of the file, you get
a #136 ERROR (aka Atari's EOF), and the partially loaded program is
erased (basically it does a NEW).

If the length is shorter than the program, it'll stop loading no
matter how much more data is in the file. This means it can stop in
the middle of a line. It also means, if there was already a program in
memory that was longer than the program length, you get a "hybrid" mix
of the new program followed by the remainder of the old one. This is
because the three $00 bytes at the end of the program weren't read in.

If the program length is correct for the actual program (so the three
$00 bytes get read), but there's extra data appended to the file, AMSB
will never read the extra data at all.


String Limitations
------------------

String literals in AMSB cannot contain the | or ATASCII heart
characters.

AMSB uses | as a terminator for quoted strings, e.g. "STRING" will
be tokenized as: "STRING|

If you try to use a | in a quoted string, it gets turned into a double
quote: "FOO|BAR" comes out as "FOO"BAR which is a syntax error!

String variables can store | but only with e.g. CHR$(124) or reading
from a file: it's string *literals* that don't allow it.

The reason | is used for a terminating quote is to allow doubling up
the quotes to embed them in a string:

A$ = "HAS ""QUOTES"""
PRINT A$ will print: HAS "QUOTES"

At first I thought "no pipe characters in strings, WTF man?" but it's
probably no worse than Atari BASIC's "no quotes in strings constants"
rule. It *would* be nice if the AMSB manual actually documented the
fact that | can't occur in a string constant. Not documenting it makes
it a bug... and they have unused tokens in the $Fx range, I don't see
why they had to use a printing character for this.

You also can't put a heart (ATASCII character 0) in a string
literal. It will be treated as the end of the line, as though you
pressed Enter (and anything else on the line is ignored). This isn't
documented in the manual, either.

Like the | character, you can use CHR$(0) to store a heart in a string
and it will work correctly.


Line Number Range
-----------------

AMSB doesn't allow entering line numbers above 63999, but if a file
is e.g. hex-edited to have a line number that's out of range, it will
LIST and RUN just fine... except that it's impossible to GOTO or GOSUB
to an out-of-range line. It will still execute if program flow falls
into it.


Differences Between Versions
----------------------------

The language is the same in AMSB versions 1 and 2. Tokenized files
made by one version will LOAD and RUN in the other version.

Version 1, the disk version, always has the full set of commands
avaiable. Version 2, the cart, only has the full set if the extension
disk is booted. The missing ones still get tokenized, but you get SN
ERROR at runtime if you try to execute them. This doesn't affect the
detokenizer at all. The missing commands:

AUTO
DEF (the string version; numeric is still present)
NOTE
RENUM
TRON
TROFF
DEL
USING
STRING$ (function, not a command)

RENUM only works in direct mode, not a program. Executing it
gives a FUNCTION CALL ERROR.

AUTO is (oddly) allowed in a program. Executing it exits the program
and puts you back in the editor, in auto-numbering mode.

It would seem weird to have POINT available but not NOTE... except
that AMSB doesn't even *have* POINT. Instead, the disk addresses
returned by NOTE are used with AT() in a PRINT statement. Not sure
if AT() works without the extensions loaded, but it won't be useful
anyway without NOTE.

One other difference between versions 1 and 2: version 2 will LOAD and
RUN the file D:AUTORUN.AMB at startup, if it exists.


Colon Weirdness
---------------

AMSB allows comments to be started with the ! and ' characters (as
well as the traditional REM). For the ! and ' variety, if they
come at the end of a line after some code, you don't have to put a colon.
Example:

10 GRAPHICS 2+16 ! NO TEXT WINDOW

However... in the tokenized format, there *is* a tokenized colon
just before the tokenized ! or ' character. LIST doesn't display it.
If you did put a colon:

10 CLOSE #1:! WE'RE DONE WITH THE FILE

...then there will be *two* colons in the tokenized file, and only
one will be LISTed.

The ELSE keyword works the same way. In this line:

10 IF A THEN PRINT ELSE STOP

...there is actually a : character just before the token for ELSE.

Even weirder: you can put as many colons in a row as you like, and
AMSB will treat it like single colon. This line of code is valid
and runs correctly:

10 PRINT "FOO"::::::PRINT "BAR"

These colons are displayed normally in LIST output.


Memory Usage
------------

On a 48K/64K Atari, FRE(0) for AMSB 1 with DOS booted (since you can't
use it without) but no device drivers is 21020. MEMLO is awfully high
($6a00).

For AMSB 2 with DOS booted, but without the extensions loaded, FRE(0)
is 24352. With extensions it's 20642 (even though the banner says 20644
BYTES FREE).

AMSB 2 without DOS gives you 29980, but how are you gonna load or save
programs without DOS? Nobody wants to use cassette, especially not
people who could afford to buy the AMSB II cartridge.


LOCKed Programs
---------------

If you save a program with SAVE "filename" LOCK, it gets saved in an
"encrypted" form. Loading a locked program disbles the LISTing or
editing the program (you get LK ERROR if you try).

The "encryption" is no better than ROT13. To encrypt, subtract each
byte from 0x54 (in an 8-bit register, using twos complement). To
decrypt, do the same. This is a reciprocal cipher, and you can think
of it as the binary equivalent of ROT13.

You can tell a LOCKed program because its first byte will be 1 instead
of 0. The next 2 bytes (the program length) unencrypted. The rest of
the file is encrypted with the lame scheme described above.

When AMSB has a LOCKed program loaded into memory, it's *not* stored
encrypted in RAM. It would be perfectly possible to write BASIC code
using direct mode to write the tokenized program out to disk. The
program starts at MEMLO and extends up to the first occurrence of
three $00 bytes. The hardest part of this would be generating the
header using only direct-mode BASIC statements (but it could be done).

However... there's no need to do that. AMSB has a flag that tells it
whether or not the currently-loaded program is LOCKed. You can just
clear the flag:

POKE 168,0

Now AMSB won't consider the program LOCKed, and you can SAVE a regular
copy of it (and LIST, edit, etc).


Line Length Limit
-----------------

In the editor, after a POKE 82,0 (to set the left margin to 0), you
can enter 120 characters (3 screen lines) on a logical line. If you
enter a program line that way *without* a space after the line number,
then LIST it, it will be 121 characters long, because AMSB will
display a space after the line number.

If you use a text editor (or write a program) to create an untokenized
BASIC program, you can have a line of code that's 125 characters
long. AMSB will accept it just fine, with LOAD. If a line is 126
characters or longer, AMSB will silently ignore that line when
LOADing.

If you create a 125-character line (with a text editor) consisting
only of a comment that begins with ! or ', without a space after the
line number, LOAD it, then SAVE it, that line will be 129 bytes long
in tokenized form. AMSB will LOAD it with no problems.

If you hex-edit a SAVEd file to create a longer line, AMSB will
accept that, too... up to 255 bytes. At 256 bytes, AMSB will lock
up after LOAD.


Crunching
---------

AMSB stores spaces in the tokenized program, just like other 8-bit
MS BASICs do, but it requires you to put spaces between keywords and
variables (unlike e.g. Commodore 64 BASIC). This seems to be because
AMSB allows keywords inside of variable names: you can have a variable
called LIFE (which contains the keyword IF) in AMSB, but you can't in
C=64 BASIC (which gives a syntax error becase it sees "L IF E").

This applies to numbers, too: POKE710,0 is a syntax error in
AMSB. This is because POKE710 is actually a valid variable name: try
POKE710=123 followed by PRINT POKE710.

However. The spaces aren't needed when the program is RUN. It would be
possible to remove all the spaces outside of strings or comments and
the program would still work fine.