AMSB(7) Urchlay AMSB(7) NAME amsb - Atari MS BASIC Notes DESCRIPTION Atari Microsoft BASIC is actu- ally a pretty cool BASIC for the Atari 8-bit. I never got the chance to use it 'back in the day' because it was expen- sive, required a floppy drive and at least 32K of RAM (my poor 400 had a tape drive for the first few years), and then later on, there was Turbo BASIC XL, which was cooler than AMSB, and also freeware. This file is a collection of notes I made to myself while developing listamsb. The infor- mation here might be useful (e.g. if you're trying to re- pair a damaged AMSB file) and hopefully is interesting. En- joy! This file is part of the bw-atari8-utils source. You can get the latest version of the source from: https://slackware.uk/~urchlay/repos/bw-atari8-tools ...which you can either view with a web browser or use with the 'git clone' command. NOTES Tokenized file format File begins with a 3-byte header: +-------+---------------+ |Offset | Purpose | +-------+---------------+ |0 | 0 for a nor- | | | mal program, | | | 1 for LOCKed | | | (encrypted) | +-------+---------------+ |1 | LSB, program | | | length, not | | | counting the | | | 3-byte header | +-------+---------------+ |2 | MSB, program | | | length | +-------+---------------+ The program length should al- ways be the actual file size minus 3. If it's not, the file has either been truncated or had junk added to the end. In a LOCKed program, the program length bytes are not encrypted. After the header, the lines of code (encrypted, for LOCKed programs). Each line has a 4-byte header: +--+--------------+ |0 | LSB, address | | | of the last | | | byte of this | | | line... | +--+--------------+ |1 | MSB, address | | | ...which is | | | ignored on | | | LOAD! | +--+--------------+ |2 | LSB, line | | | number | +--+--------------+ |3 | MSB, line | | | number | +--+--------------+ The rest of the line is the to- kens, terminated by a $00 byte. The next 2 bytes after the $00 is the last-byte offset of the next line. The last "line" of the program has a $0000 offset, which indi- cates the end of the program. Since the actual last line ends with a $00, that means there will be three $00 bytes in a row as the last 3 bytes of the file. And that's the only place 3 $00's in a row will occur. Tokenization is "lightweight": there are no tokenized numer- ics, they're just stored as ASCII characters, as typed. There's no "string constant follows" token like there is in Atari BASIC (well, there is, it's just a double-quote, $22. There's no length byte). Vari- able names are not tokenized, either, they're just stored as-is (name in ASCII, including trailing $ for strings, etc). Numeric constants are just stored as ASCII digits, just as you typed them. In fact the only things that are tokenized are BASIC key- words: commands and func- tions... NOT including user functions defined with DEF (those are stored as just the ASCII function name, like vari- ables). There are 2 sets of tokens. One set is single-byte, $80 and up. These are commands. The other set is functions, which are 2 bytes: $FF followed by the to- ken number. See amsbtok.h in the source for the actual to- kens. AMSB saves the end-of-line pointers, but it totally ig- nores them on LOAD. The SAVEd file format does not have a load address (as e.g. Com- modore BASIC does), so there's no way to know the address of the start of the program (other than counting backwards from the next line, since its ad- dress is known). It's not just a constant either: it depends on what MEMLO was set to when the program was saved (which varies depending on what ver- sion of AMSB you have, what DOS you boot, whether or not you have the R: device driver loaded, etc etc). Redundant Tokens There are two separate tokens each for PRINT and AT: +----+-------+ |$ab | PRINT | +----+-------+ |$ac | PRINT | +----+-------+ |$df | AT( | +----+-------+ |$e0 | AT | +----+-------+ When tokenizing a line, AMSB will actually use the $ab token if there's a space after PRINT (or ?), otherwise it will use the $ac token. These lines ac- tually get tokenized differ- ently: 10 PRINT "HELLO" 10 PRINT"HELLO" Same applies to the $df and $e0 AT tokens: if the user entered "AT(X,Y)", $df is used. Other- wise, with "AT (X,Y)", $e0 is used (followed by an ASCII left parenthesis). 3 tokens include the opening parenthesis: +----+------+ |$d2 | TAB( | +----+------+ |$d6 | SPC( | +----+------+ |$df | AT( | +----+------+ Normally in AMSB, it's OK to leave a space between a func- tion name and the left-paren. PEEK (123) and SIN (1) are both valid. However, for SPC and TAB, no space is allowed, be- cause the ( is part of the to- ken. AT would be the same way, except there's a separate token $e0 that includes the space. Weird, huh? A side effect of this is that "SPC (10)" or "TAB (10)" won't be treated as a function call. Instead, the SPC or TAB is treated as a variable name. If you write: PRINT TAB (10);"HELLO" ...it'll print " 0 HELLO" at the start of the line[*], in- stead of "HELLO" in the 10th column as you might have ex- pected. It also means that AT, TAB, and SPC are valid variable names in AMSB, which is an ex- ception to the rule that key- words can't be used as variable names (e.g. SIN=1 or STRING$="HELLO" are invalid). [*] Unless you've assigned an- other value to TAB, of couse. Unused Tokens If you look at the token list in amsbtok.h (or in a hex dump of the AMSB executable or car- tridge image), you'll see a lot of double-quotes mixed in with the list. AMSB doesn't actually tokenize the " character (it's stored as $22, its ASCII value), so these seem to be placeholders, either because some tokens were deleted from the language during its devel- opment, or else they're in- tended for some future version of AMSB that never happened. The weird quote tokens are $99, $c8 to $d0, $d5, and $e7 to $ed. If you hexedit a program to replace a regular dou- ble-quote with one of these to- kens, it will list as either "" or just one ", but it will cause a syntax error at run- time. LOADing Untokenized Files If the first byte of the file is anything other than $00 or $01, AMSB's LOAD command reads it in as a text file (LISTed rather than SAVEd). When LOAD is reading a text file, if the last byte of the file isn't an ATASCII EOL ($9b), you'll get #136 ERROR. The program doesn't get deleted, but the last line of the file didn't get loaded. This could happen if a LISTed file somehow got truncated. While on the subject... the manual doesn't mention it, but if you LOAD a text file without line numbers, the code gets ex- ecuted in direct mode during the load (like Atari BASIC's ENTER command does). This means you could write scripts (batch files) for AMSB... though you'd be better off using MERGE, rather than LOAD (MERGE is ba- sically the same thing as Atari BASIC's ENTER). Program Length Header Mismatch When AMSB's LOAD command exe- cutes, it reads the 3-byte header, then reads as many bytes as the header's program length says. If the header length is longer than the rest of the file, you get a #136 ERROR (aka Atari's EOF), and the partially loaded program is erased (basically it does a NEW). If the length is shorter than the program, it'll stop loading no matter how much more data is in the file. This means it can stop in the middle of a line. It also means, if there was al- ready a program in memory that was longer than the program length, you get a "hybrid" mix of the new program followed by the remainder of the old one. This is because the three $00 bytes at the end of the program weren't read in. If the program length is cor- rect for the actual program (so the three $00 bytes get read), but there's extra data appended to the file, AMSB will never read the extra data at all. String Limitations String literals in AMSB cannot contain the | or ATASCII heart characters. AMSB uses | as a terminator for quoted strings, e.g. "STRING" will be tokenized as: "STRING| If you try to use a | in a quoted string, it gets turned into a double quote: "FOO|BAR" comes out as "FOO"BAR which is a syntax error! String variables can store | but only with e.g. CHR$(124) or reading from a file: it's string literals that don't al- low it. The reason | is used for a ter- minating quote is to allow dou- bling up the quotes to embed them in a string: A$ = "HAS ""QUOTES""" PRINT A$ will print: HAS "QUOTES" At first I thought "no pipe characters in strings, WTF man?" but it's probably no worse than Atari BASIC's "no quotes in strings constants" rule. It would be nice if the AMSB manual actually documented the fact that | can't occur in a string constant. Not docu- menting it makes it a bug... and they have unused tokens in the $Fx range, I don't see why they had to use a printing character for this. You also can't put a heart (ATASCII character 0) in a string literal. It will be treated as the end of the line, as though you pressed Enter (and anything else on the line is ignored). This isn't docu- mented in the manual, either. Like the | character, you can use CHR$(0) to store a heart in a string and it will work cor- rectly. Line Number Range AMSB doesn't allow entering line numbers above 63999, but if a file is e.g. hex-edited to have a line number that's out of range, it will LIST and RUN just fine... except that it's impossible to GOTO or GOSUB to an out-of-range line. It will still execute if program flow falls into it. Differences Between Versions The language is the same in AMSB versions 1 and 2. Tok- enized files made by one ver- sion will LOAD and RUN in the other version. Version 1, the disk version, always has the full set of com- mands avaiable. Version 2, the cart, only has the full set if the extension disk is booted. The missing ones still get tok- enized, but you get SN ERROR at runtime if you try to execute them. This doesn't affect the detokenizer at all. The missing commands: AUTO DEF (string version only) NOTE RENUM TRON TROFF DEL USING STRING$ (function) RENUM only works in direct mode, not a program. Executing it gives a FUNCTION CALL ERROR. AUTO is (oddly) allowed in a program. Executing it exits the program and puts you back in the editor, in auto-numbering mode. It would seem weird to have POINT available but not NOTE... except that AMSB doesn't even have POINT. Instead, the disk addresses returned by NOTE are used with AT() in a PRINT statement. Not sure if AT() works without the extensions loaded, but it won't be useful anyway without NOTE. One other difference between versions 1 and 2: version 2 will LOAD and RUN the file D:AUTORUN.AMB at startup, if it exists. Colon Weirdness AMSB allows comments to be started with the ! and ' char- acters (as well as the tradi- tional REM). For the ! and ' variety, if they come at the end of a line after some code, you don't have to put a colon. Example: 10 GRAPHICS 2+16 ! NO TEXT However... in the tokenized format, there is a tokenized colon just before the tokenized ! or ' character. LIST doesn't display it. If you did put a colon: 10 CLOSE #1:! WE'RE DONE ...then there will be two colons in the tokenized file, and only one will be LISTed. The ELSE keyword works the same way. In this line: 10 IF A THEN PRINT ELSE STOP ...there is actually a : char- acter just before the token for ELSE. Even weirder: you can put as many colons in a row as you like, and AMSB will treat it like single colon. This line of code is valid and runs cor- rectly: 10 PRINT "A"::::::PRINT "A" These colons are displayed nor- mally in LIST output. Memory Usage On a 48K/64K Atari, FRE(0) for AMSB 1 with DOS booted (since you can't use it without) but no device drivers is 21020. MEMLO is awfully high ($6a00). For AMSB 2 with DOS booted, but without the extensions loaded, FRE(0) is 24352. With exten- sions it's 20642 (even though the banner says 20644 BYTES FREE). AMSB 2 without DOS gives you 29980, but how are you gonna load or save programs without DOS? Nobody wants to use cas- sette, especially not people who could afford to buy the AMSB II cartridge. LOCKed Programs If you save a program with SAVE "filename" LOCK, it gets saved in an "encrypted" form. Loading a locked program disables the LISTing or editing the program (you get LK ERROR if you try). The "encryption" is no better than ROT13. To encrypt, sub- tract each byte from 0x54 (in an 8-bit register, using twos complement). To decrypt, do the same. This is a reciprocal ci- pher, and you can think of it as the binary equivalent of ROT13. You can tell a LOCKed program because its first byte will be 1 instead of 0. The next 2 bytes (the program length) un- encrypted. The rest of the file is encrypted with the lame scheme described above. When AMSB has a LOCKed program loaded into memory, it's not stored encrypted in RAM. It would be perfectly possible to write BASIC code using direct mode to write the tokenized program out to disk. The pro- gram starts at MEMLO and ex- tends up to the first occur- rence of three $00 bytes. The hardest part of this would be generating the header using only direct-mode BASIC state- ments (but it could be done). However... there's no need to do that. AMSB has a flag that tells it whether or not the currently-loaded program is LOCKed. You can just clear the flag: POKE 168,0 Now AMSB won't consider the program LOCKed, and you can SAVE a regular copy of it (and LIST, edit, etc). Line Length Limit In the editor, after a POKE 82,0 (to set the left margin to 0), you can enter 120 charac- ters (3 screen lines) on a log- ical line. If you enter a pro- gram line that way without a space after the line number, then LIST it, it will be 121 characters long, because AMSB will display a space after the line number. If you use a text editor (or write a program) to create an untokenized BASIC program, you can have a line of code that's 125 characters long. AMSB will accept it just fine, with LOAD. If a line is 126 characters or longer, AMSB will silently ig- nore that line when LOADing. If you create a 125-character line (with a text editor) con- sisting only of a comment that begins with ! or ', without a space after the line number, LOAD it, then SAVE it, that line will be 129 bytes long in tokenized form. AMSB will LOAD it with no problems. If you hex-edit a SAVEd file to create a longer line, AMSB will accept that, too... up to 255 bytes. At 256 bytes, AMSB will lock up after LOAD. Crunching AMSB stores spaces in the tok- enized program, just like other 8-bit MS BASICs do, but it re- quires you to put spaces be- tween keywords and variables (unlike e.g. Commodore 64 BA- SIC). This seems to be because AMSB allows keywords inside of variable names: you can have a variable called LIFE (which contains the keyword IF) in AMSB, but you can't in C=64 BA- SIC (which gives a syntax error becase it sees "L IF E"). This applies to numbers, too: POKE710,0 is a syntax error in AMSB. This is because POKE710 is actually a valid variable name: try POKE710=123 followed by PRINT POKE710. However. The spaces aren't needed when the program is RUN. It would be possible to remove all the spaces outside of strings or comments and the program would still work fine. COPYRIGHT WTFPL. See http://www.wtfpl.net/txt/copying for details. AUTHOR B. Watson Email: urchlay@slackware.uk IRC: Urchlay on irc.libera.chat ##atari. 0.2.2 2025-03-13 AMSB(7)