diff options
Diffstat (limited to 'amsb40.txt')
-rw-r--r-- | amsb40.txt | 668 |
1 files changed, 668 insertions, 0 deletions
diff --git a/amsb40.txt b/amsb40.txt new file mode 100644 index 0000000..5c6f537 --- /dev/null +++ b/amsb40.txt @@ -0,0 +1,668 @@ +AMSB(7) Urchlay AMSB(7) + +NAME + amsb - Atari MS BASIC Notes + +DESCRIPTION + Atari Microsoft BASIC is actu- + ally a pretty cool BASIC for + the Atari 8-bit. I never got + the chance to use it 'back in + the day' because it was expen- + sive, required a floppy drive + and at least 32K of RAM (my + poor 400 had a tape drive for + the first few years), and then + later on, there was Turbo BASIC + XL, which was cooler than AMSB, + and also freeware. + + This file is a collection of + notes I made to myself while + developing listamsb. The infor- + mation here might be useful + (e.g. if you're trying to re- + pair a damaged AMSB file) and + hopefully is interesting. En- + joy! + + This file is part of the + bw-atari8-utils source. You can + get the latest version of the + source from: + + https://slackware.uk/~urchlay/repos/bw-atari8-tools + + ...which you can either view + with a web browser or use with + the 'git clone' command. + +NOTES + Tokenized file format + File begins with a 3-byte + header: + + +-------+---------------+ + |Offset | Purpose | + +-------+---------------+ + |0 | 0 for a nor- | + | | mal program, | + | | 1 for LOCKed | + | | (encrypted) | + +-------+---------------+ + |1 | LSB, program | + | | length, not | + | | counting the | + | | 3-byte header | + +-------+---------------+ + |2 | MSB, program | + | | length | + +-------+---------------+ + + The program length should al- + ways be the actual file size + minus 3. If it's not, the file + has either been truncated or + had junk added to the end. In a + LOCKed program, the program + length bytes are not encrypted. + + After the header, the lines of + code (encrypted, for LOCKed + programs). Each line has a + 4-byte header: + + +--+--------------+ + |0 | LSB, address | + | | of the last | + | | byte of this | + | | line... | + +--+--------------+ + |1 | MSB, address | + | | ...which is | + | | ignored on | + | | LOAD! | + +--+--------------+ + |2 | LSB, line | + | | number | + +--+--------------+ + |3 | MSB, line | + | | number | + +--+--------------+ + + The rest of the line is the to- + kens, terminated by a $00 byte. + The next 2 bytes after the $00 + is the last-byte offset of the + next line. + + The last "line" of the program + has a $0000 offset, which indi- + cates the end of the program. + Since the actual last line ends + with a $00, that means there + will be three $00 bytes in a + row as the last 3 bytes of the + file. And that's the only place + 3 $00's in a row will occur. + + Tokenization is "lightweight": + there are no tokenized numer- + ics, they're just stored as + ASCII characters, as typed. + There's no "string constant + follows" token like there is in + Atari BASIC (well, there is, + it's just a double-quote, $22. + There's no length byte). Vari- + able names are not tokenized, + either, they're just stored + as-is (name in ASCII, including + trailing $ for strings, etc). + Numeric constants are just + stored as ASCII digits, just as + you typed them. + + In fact the only things that + are tokenized are BASIC key- + words: commands and func- + tions... NOT including user + functions defined with DEF + (those are stored as just the + ASCII function name, like vari- + ables). + + There are 2 sets of tokens. One + set is single-byte, $80 and up. + These are commands. The other + set is functions, which are 2 + bytes: $FF followed by the to- + ken number. See amsbtok.h in + the source for the actual to- + kens. + + AMSB saves the end-of-line + pointers, but it totally ig- + nores them on LOAD. The SAVEd + file format does not have a + load address (as e.g. Com- + modore BASIC does), so there's + no way to know the address of + the start of the program (other + than counting backwards from + the next line, since its ad- + dress is known). It's not just + a constant either: it depends + on what MEMLO was set to when + the program was saved (which + varies depending on what ver- + sion of AMSB you have, what DOS + you boot, whether or not you + have the R: device driver + loaded, etc etc). + + Redundant Tokens + There are two separate tokens + each for PRINT and AT: + + +----+-------+ + |$ab | PRINT | + +----+-------+ + |$ac | PRINT | + +----+-------+ + |$df | AT( | + +----+-------+ + |$e0 | AT | + +----+-------+ + + When tokenizing a line, AMSB + will actually use the $ab token + if there's a space after PRINT + (or ?), otherwise it will use + the $ac token. These lines ac- + tually get tokenized differ- + ently: + + 10 PRINT "HELLO" + 10 PRINT"HELLO" + + Same applies to the $df and $e0 + AT tokens: if the user entered + "AT(X,Y)", $df is used. Other- + wise, with "AT (X,Y)", $e0 is + used (followed by an ASCII left + parenthesis). + + 3 tokens include the opening + parenthesis: + + +----+------+ + |$d2 | TAB( | + +----+------+ + |$d6 | SPC( | + +----+------+ + |$df | AT( | + +----+------+ + + Normally in AMSB, it's OK to + leave a space between a func- + tion name and the left-paren. + PEEK (123) and SIN (1) are both + valid. However, for SPC and + TAB, no space is allowed, be- + cause the ( is part of the to- + ken. AT would be the same way, + except there's a separate token + $e0 that includes the space. + Weird, huh? A side effect of + this is that "SPC (10)" or "TAB + (10)" won't be treated as a + function call. Instead, the + SPC or TAB is treated as a + variable name. If you write: + + PRINT TAB (10);"HELLO" + + ...it'll print " 0 HELLO" at + the start of the line[*], in- + stead of "HELLO" in the 10th + column as you might have ex- + pected. It also means that AT, + TAB, and SPC are valid variable + names in AMSB, which is an ex- + ception to the rule that key- + words can't be used as variable + names (e.g. SIN=1 or + STRING$="HELLO" are invalid). + + [*] Unless you've assigned an- + other value to TAB, of couse. + + Unused Tokens + If you look at the token list + in amsbtok.h (or in a hex dump + of the AMSB executable or car- + tridge image), you'll see a lot + of double-quotes mixed in with + the list. AMSB doesn't actually + tokenize the " character (it's + stored as $22, its ASCII + value), so these seem to be + placeholders, either because + some tokens were deleted from + the language during its devel- + opment, or else they're in- + tended for some future version + of AMSB that never happened. + + The weird quote tokens are $99, + $c8 to $d0, $d5, and $e7 to + $ed. If you hexedit a program + to replace a regular dou- + ble-quote with one of these to- + kens, it will list as either "" + or just one ", but it will + cause a syntax error at run- + time. + + LOADing Untokenized Files + If the first byte of the file + is anything other than $00 or + $01, AMSB's LOAD command reads + it in as a text file (LISTed + rather than SAVEd). + + When LOAD is reading a text + file, if the last byte of the + file isn't an ATASCII EOL + ($9b), you'll get #136 ERROR. + The program doesn't get + deleted, but the last line of + the file didn't get loaded. + This could happen if a LISTed + file somehow got truncated. + + While on the subject... the + manual doesn't mention it, but + if you LOAD a text file without + line numbers, the code gets ex- + ecuted in direct mode during + the load (like Atari BASIC's + ENTER command does). This means + you could write scripts (batch + files) for AMSB... though you'd + be better off using MERGE, + rather than LOAD (MERGE is ba- + sically the same thing as Atari + BASIC's ENTER). + + Program Length Header Mismatch + When AMSB's LOAD command exe- + cutes, it reads the 3-byte + header, then reads as many + bytes as the header's program + length says. + + If the header length is longer + than the rest of the file, you + get a #136 ERROR (aka Atari's + EOF), and the partially loaded + program is erased (basically it + does a NEW). + + If the length is shorter than + the program, it'll stop loading + no matter how much more data is + in the file. This means it can + stop in the middle of a line. + It also means, if there was al- + ready a program in memory that + was longer than the program + length, you get a "hybrid" mix + of the new program followed by + the remainder of the old one. + This is because the three $00 + bytes at the end of the program + weren't read in. + + If the program length is cor- + rect for the actual program (so + the three $00 bytes get read), + but there's extra data appended + to the file, AMSB will never + read the extra data at all. + + String Limitations + String literals in AMSB cannot + contain the | or ATASCII heart + characters. + + AMSB uses | as a terminator for + quoted strings, e.g. "STRING" + will be tokenized as: "STRING| + + If you try to use a | in a + quoted string, it gets turned + into a double quote: "FOO|BAR" + comes out as "FOO"BAR which is + a syntax error! + + String variables can store | + but only with e.g. CHR$(124) or + reading from a file: it's + string literals that don't al- + low it. + + The reason | is used for a ter- + minating quote is to allow dou- + bling up the quotes to embed + them in a string: + + A$ = "HAS ""QUOTES""" + + PRINT A$ will print: HAS + "QUOTES" + + At first I thought "no pipe + characters in strings, WTF + man?" but it's probably no + worse than Atari BASIC's "no + quotes in strings constants" + rule. It would be nice if the + AMSB manual actually documented + the fact that | can't occur in + a string constant. Not docu- + menting it makes it a bug... + and they have unused tokens in + the $Fx range, I don't see why + they had to use a printing + character for this. + + You also can't put a heart + (ATASCII character 0) in a + string literal. It will be + treated as the end of the line, + as though you pressed Enter + (and anything else on the line + is ignored). This isn't docu- + mented in the manual, either. + + Like the | character, you can + use CHR$(0) to store a heart in + a string and it will work cor- + rectly. + + Line Number Range + AMSB doesn't allow entering + line numbers above 63999, but + if a file is e.g. hex-edited to + have a line number that's out + of range, it will LIST and RUN + just fine... except that it's + impossible to GOTO or GOSUB to + an out-of-range line. It will + still execute if program flow + falls into it. + + Differences Between Versions + The language is the same in + AMSB versions 1 and 2. Tok- + enized files made by one ver- + sion will LOAD and RUN in the + other version. + + Version 1, the disk version, + always has the full set of com- + mands avaiable. Version 2, the + cart, only has the full set if + the extension disk is booted. + The missing ones still get tok- + enized, but you get SN ERROR at + runtime if you try to execute + them. This doesn't affect the + detokenizer at all. The missing + commands: + + AUTO + DEF (string version only) + NOTE + RENUM + TRON + TROFF + DEL + USING + STRING$ (function) + + RENUM only works in direct + mode, not a program. Executing + it gives a FUNCTION CALL ERROR. + + AUTO is (oddly) allowed in a + program. Executing it exits the + program and puts you back in + the editor, in auto-numbering + mode. + + It would seem weird to have + POINT available but not NOTE... + except that AMSB doesn't even + have POINT. Instead, the disk + addresses returned by NOTE are + used with AT() in a PRINT + statement. Not sure if AT() + works without the extensions + loaded, but it won't be useful + anyway without NOTE. + + One other difference between + versions 1 and 2: version 2 + will LOAD and RUN the file + D:AUTORUN.AMB at startup, if it + exists. + + Colon Weirdness + AMSB allows comments to be + started with the ! and ' char- + acters (as well as the tradi- + tional REM). For the ! and ' + variety, if they come at the + end of a line after some code, + you don't have to put a colon. + Example: + + 10 GRAPHICS 2+16 ! NO TEXT + + However... in the tokenized + format, there is a tokenized + colon just before the tokenized + ! or ' character. LIST doesn't + display it. If you did put a + colon: + + 10 CLOSE #1:! WE'RE DONE + + ...then there will be two + colons in the tokenized file, + and only one will be LISTed. + + The ELSE keyword works the same + way. In this line: + + 10 IF A THEN PRINT ELSE STOP + + ...there is actually a : char- + acter just before the token for + ELSE. + + Even weirder: you can put as + many colons in a row as you + like, and AMSB will treat it + like single colon. This line of + code is valid and runs cor- + rectly: + + 10 PRINT "A"::::::PRINT "A" + + These colons are displayed nor- + mally in LIST output. + + Memory Usage + On a 48K/64K Atari, FRE(0) for + AMSB 1 with DOS booted (since + you can't use it without) but + no device drivers is 21020. + MEMLO is awfully high ($6a00). + + For AMSB 2 with DOS booted, but + without the extensions loaded, + FRE(0) is 24352. With exten- + sions it's 20642 (even though + the banner says 20644 BYTES + FREE). + + AMSB 2 without DOS gives you + 29980, but how are you gonna + load or save programs without + DOS? Nobody wants to use cas- + sette, especially not people + who could afford to buy the + AMSB II cartridge. + + LOCKed Programs + If you save a program with SAVE + "filename" LOCK, it gets saved + in an "encrypted" form. Loading + a locked program disables the + LISTing or editing the program + (you get LK ERROR if you try). + + The "encryption" is no better + than ROT13. To encrypt, sub- + tract each byte from 0x54 (in + an 8-bit register, using twos + complement). To decrypt, do the + same. This is a reciprocal ci- + pher, and you can think of it + as the binary equivalent of + ROT13. + + You can tell a LOCKed program + because its first byte will be + 1 instead of 0. The next 2 + bytes (the program length) un- + encrypted. The rest of the file + is encrypted with the lame + scheme described above. + + When AMSB has a LOCKed program + loaded into memory, it's not + stored encrypted in RAM. It + would be perfectly possible to + write BASIC code using direct + mode to write the tokenized + program out to disk. The pro- + gram starts at MEMLO and ex- + tends up to the first occur- + rence of three $00 bytes. The + hardest part of this would be + generating the header using + only direct-mode BASIC state- + ments (but it could be done). + + However... there's no need to + do that. AMSB has a flag that + tells it whether or not the + currently-loaded program is + LOCKed. You can just clear the + flag: + + POKE 168,0 + + Now AMSB won't consider the + program LOCKed, and you can + SAVE a regular copy of it (and + LIST, edit, etc). + + Line Length Limit + In the editor, after a POKE + 82,0 (to set the left margin to + 0), you can enter 120 charac- + ters (3 screen lines) on a log- + ical line. If you enter a pro- + gram line that way without a + space after the line number, + then LIST it, it will be 121 + characters long, because AMSB + will display a space after the + line number. + + If you use a text editor (or + write a program) to create an + untokenized BASIC program, you + can have a line of code that's + 125 characters long. AMSB will + accept it just fine, with LOAD. + If a line is 126 characters or + longer, AMSB will silently ig- + nore that line when LOADing. + + If you create a 125-character + line (with a text editor) con- + sisting only of a comment that + begins with ! or ', without a + space after the line number, + LOAD it, then SAVE it, that + line will be 129 bytes long in + tokenized form. AMSB will LOAD + it with no problems. + + If you hex-edit a SAVEd file to + create a longer line, AMSB will + accept that, too... up to 255 + bytes. At 256 bytes, AMSB will + lock up after LOAD. + + Crunching + AMSB stores spaces in the tok- + enized program, just like other + 8-bit MS BASICs do, but it re- + quires you to put spaces be- + tween keywords and variables + (unlike e.g. Commodore 64 BA- + SIC). This seems to be because + AMSB allows keywords inside of + variable names: you can have a + variable called LIFE (which + contains the keyword IF) in + AMSB, but you can't in C=64 BA- + SIC (which gives a syntax error + becase it sees "L IF E"). + + This applies to numbers, too: + POKE710,0 is a syntax error in + AMSB. This is because POKE710 + is actually a valid variable + name: try POKE710=123 followed + by PRINT POKE710. + + However. The spaces aren't + needed when the program is RUN. + It would be possible to remove + all the spaces outside of + strings or comments and the + program would still work fine. + +COPYRIGHT + WTFPL. See + http://www.wtfpl.net/txt/copying + for details. + +AUTHOR + B. Watson + + Email: urchlay@slackware.uk + + IRC: Urchlay on irc.libera.chat + ##atari. + +0.2.2 2025-03-13 AMSB(7) |