Atari Microsoft BASIC Notes --------------------------- AMSB is actually a pretty cool BASIC for the Atari 8-bit. I never got the chance to use it 'back in the day' because it was expensive, required a floppy drive and at least 32K of RAM (my poor 400 had a tape drive for the first few years), and then later on, there was Turbo BASIC XL, which was cooler than AMSB, and also freeware. This file is a collection of notes I made to myself while developing listamsb. The information here might be useful (e.g. if you're trying to repair a damaged AMSB file) and hopefully is interesting. Enjoy! This file is part of the bw-atari8-utils source. You can get the latest version of the source from: https://slackware.uk/~urchlay/repos/bw-atari8-tools ...which you can either view with a web browser or use with the 'git clone' command. -- B. Watson Tokenized file format --------------------- File begins with a 3-byte header: offset | purpose -------+------------------------------------------------------- 0 | 0 for a normal program, 1 for LOCKed (encrypted). 1 | LSB, program length, not counting the 3-byte header... 2 | MSB, " " The program length should always be the actual file size minus 3. If it's not, the file has either been truncated or had junk added to the end. In a LOCKed program, the program length bytes are not encrypted. After the header, the lines of code (encrypted, for LOCKed programs). Each line has a 4-byte header: offset | purpose -------+------------------------------------------------------- 0 | LSB, address of the last byte of this line... 1 | MSB, address ...which is ignored on LOAD! 2 | LSB of line number 3 | MSB " " " The rest of the line is the tokens, terminated by a $00 byte. The next 2 bytes after the $00 is the last-byte offset of the next line. The last "line" of the program has a $0000 offset, which indicates the end of the program. Since the actual last line ends with a $00, that means there will be three $00 bytes in a row as the last 3 bytes of the file. And that's the *only* place 3 $00's in a row will occur. Tokenization is "lightweight": there are no tokenized numerics, they're just stored as ASCII characters, as typed. There's no "string constant follows" token like there is in Atari BASIC (well, there is, it's just a double-quote, $22. There's no length byte). Variable names are not tokenized, either, they're just stored as-is (name in ASCII, including trailing $ for strings, etc). Numeric constants are just stored as ASCII digits, just as you typed them. In fact the only things that are tokenized are BASIC keywords: commands and functions... NOT including user functions defined with DEF (those are stored as just the ASCII function name, like variables). There are 2 sets of tokens. One set is single-byte, $80 and up. These are commands. The other set is functions, which are 2 bytes: $FF followed by the token number. See amsbtok.h in the source for the actual tokens. AMSB saves the end-of-line pointers, but it totally ignores them on LOAD. The SAVEd file format does *not* have a load address (as e.g. Commodore BASIC does), so there's no way to know the address of the start of the program (other than counting backwards from the next line, since its address is known). It's not just a constant either: it depends on what MEMLO was set to when the program was saved (which varies depending on what version of AMSB you have, what DOS you boot, whether or not you have the R: device driver loaded, etc etc). Redundant Tokens ---------------- There are two separate tokens each for PRINT and AT: token | text ------+----------------------- $ab | "PRINT " $ac | "PRINT" $df | "AT(" $e0 | "AT " When tokenizing a line, AMSB will actually use the $ab token if there's a space after PRINT (or ?), otherwise it will use the $ac token. These lines actually get tokenized differently: 10 PRINT "HELLO" 10 PRINT"HELLO" Same applies to the $df and $e0 AT tokens: if the user entered "AT(X,Y)", $df is used. Otherwise, with "AT (X,Y)", $e0 is used (followed by an ASCII left parenthesis). 3 tokens include the opening parenthesis: token | text ------+----------------------- $d2 | "TAB(" $d6 | "SPC(" $df | "AT(" Normally in AMSB, it's OK to leave a space between a function name and the left-paren. PEEK (123) and SIN (1) are both valid. However, for SPC and TAB, no space is allowed, because the ( is part of the token. AT would be the same way, except there's a separate token $e0 that *includes* the space. Weird, huh? A side effect of this is that "SPC (10)" or "TAB (10)" won't be treated as a function call. Instead, the SPC or TAB is treated as a variable name. If you write: PRINT TAB (10);"HELLO" ...it'll print " 0 HELLO" at the start of the line[*], instead of "HELLO" in the 10th column as you might have expected. It also means that AT, TAB, and SPC are valid variable names in AMSB, which is an exception to the rule that keywords can't be used as variable names (e.g. SIN=1 or STRING$="HELLO" are invalid). [*] Unless you've assigned another value to TAB, of couse. Unused Tokens ------------- If you look at the token list in amsbtok.h (or in a hex dump of the AMSB executable or cartridge image), you'll see a lot of double-quotes mixed in with the list. AMSB doesn't actually tokenize the " character (it's stored as $22, its ASCII value), so these seem to be placeholders, either because some tokens were deleted from the language during its development, or else they're intended for some future version of AMSB that never happened. The weird quote tokens are $99, $c8 to $d0, $d5, and $e7 to $ed. If you hexedit a program to replace a regular double-quote with one of these tokens, it will list as either "" or just one ", but it will cause a syntax error at runtime. LOADing Untokenized Files ------------------------- If the first byte of the file is anything other than $00 or $01, AMSB's LOAD command reads it in as a text file (LISTed rather than SAVEd). When LOAD is reading a text file, if the last byte of the file isn't an ATASCII EOL ($9b), you'll get #136 ERROR. The program doesn't get deleted, but the last line of the file didn't get loaded. This could happen if a LISTed file somehow got truncated. While on the subject... the manual doesn't mention it, but if you LOAD a text file without line numbers, the code gets executed in direct mode during the load (like Atari BASIC's ENTER command does). This means you could write scripts (batch files) for AMSB... though you'd be better off using MERGE, rather than LOAD (MERGE is basically the same thing as Atari BASIC's ENTER). Program Length Header Mismatch ------------------------------ When AMSB's LOAD command executes, it reads the 3-byte header, then reads as many bytes as the header's program length says. If the header length is longer than the rest of the file, you get a #136 ERROR (aka Atari's EOF), and the partially loaded program is erased (basically it does a NEW). If the length is shorter than the program, it'll stop loading no matter how much more data is in the file. This means it can stop in the middle of a line. It also means, if there was already a program in memory that was longer than the program length, you get a "hybrid" mix of the new program followed by the remainder of the old one. This is because the three $00 bytes at the end of the program weren't read in. If the program length is correct for the actual program (so the three $00 bytes get read), but there's extra data appended to the file, AMSB will never read the extra data at all. String Limitations ------------------ String literals in AMSB cannot contain the | or ATASCII heart characters. AMSB uses | as a terminator for quoted strings, e.g. "STRING" will be tokenized as: "STRING| If you try to use a | in a quoted string, it gets turned into a double quote: "FOO|BAR" comes out as "FOO"BAR which is a syntax error! String variables can store | but only with e.g. CHR$(124) or reading from a file: it's string *literals* that don't allow it. The reason | is used for a terminating quote is to allow doubling up the quotes to embed them in a string: A$ = "HAS ""QUOTES""" PRINT A$ will print: HAS "QUOTES" At first I thought "no pipe characters in strings, WTF man?" but it's probably no worse than Atari BASIC's "no quotes in strings constants" rule. It *would* be nice if the AMSB manual actually documented the fact that | can't occur in a string constant. Not documenting it makes it a bug... and they have unused tokens in the $Fx range, I don't see why they had to use a printing character for this. You also can't put a heart (ATASCII character 0) in a string literal. It will be treated as the end of the line, as though you pressed Enter (and anything else on the line is ignored). This isn't documented in the manual, either. Like the | character, you can use CHR$(0) to store a heart in a string and it will work correctly. Line Number Range ----------------- AMSB doesn't allow entering line numbers above 63999, but if a file is e.g. hex-edited to have a line number that's out of range, it will LIST and RUN just fine... except that it's impossible to GOTO or GOSUB to an out-of-range line. It will still execute if program flow falls into it. Differences Between Versions ---------------------------- The language is the same in AMSB versions 1 and 2. Tokenized files made by one version will LOAD and RUN in the other version. Version 1, the disk version, always has the full set of commands avaiable. Version 2, the cart, only has the full set if the extension disk is booted. The missing ones still get tokenized, but you get SN ERROR at runtime if you try to execute them. This doesn't affect the detokenizer at all. The missing commands: AUTO DEF (the string version; numeric is still present) NOTE RENUM TRON TROFF DEL USING STRING$ (function, not a command) RENUM only works in direct mode, not a program. Executing it gives a FUNCTION CALL ERROR. AUTO is (oddly) allowed in a program. Executing it exits the program and puts you back in the editor, in auto-numbering mode. It would seem weird to have POINT available but not NOTE... except that AMSB doesn't even *have* POINT. Instead, the disk addresses returned by NOTE are used with AT() in a PRINT statement. Not sure if AT() works without the extensions loaded, but it won't be useful anyway without NOTE. One other difference between versions 1 and 2: version 2 will LOAD and RUN the file D:AUTORUN.AMB at startup, if it exists. Colon Weirdness --------------- AMSB allows comments to be started with the ! and ' characters (as well as the traditional REM). For the ! and ' variety, if they come at the end of a line after some code, you don't have to put a colon. Example: 10 GRAPHICS 2+16 ! NO TEXT WINDOW However... in the tokenized format, there *is* a tokenized colon just before the tokenized ! or ' character. LIST doesn't display it. If you did put a colon: 10 CLOSE #1:! WE'RE DONE WITH THE FILE ...then there will be *two* colons in the tokenized file, and only one will be LISTed. The ELSE keyword works the same way. In this line: 10 IF A THEN PRINT ELSE STOP ...there is actually a : character just before the token for ELSE. Even weirder: you can put as many colons in a row as you like, and AMSB will treat it like single colon. This line of code is valid and runs correctly: 10 PRINT "FOO"::::::PRINT "BAR" These colons are displayed normally in LIST output. Memory Usage ------------ On a 48K/64K Atari, FRE(0) for AMSB 1 with DOS booted (since you can't use it without) but no device drivers is 21020. MEMLO is awfully high ($6a00). For AMSB 2 with DOS booted, but without the extensions loaded, FRE(0) is 24352. With extensions it's 20642 (even though the banner says 20644 BYTES FREE). AMSB 2 without DOS gives you 29980, but how are you gonna load or save programs without DOS? Nobody wants to use cassette, especially not people who could afford to buy the AMSB II cartridge. LOCKed Programs --------------- If you save a program with SAVE "filename" LOCK, it gets saved in an "encrypted" form. Loading a locked program disbles the LISTing or editing the program (you get LK ERROR if you try). The "encryption" is no better than ROT13. To encrypt, subtract each byte from 0x54 (in an 8-bit register, using twos complement). To decrypt, do the same. This is a reciprocal cipher, and you can think of it as the binary equivalent of ROT13. You can tell a LOCKed program because its first byte will be 1 instead of 0. The next 2 bytes (the program length) unencrypted. The rest of the file is encrypted with the lame scheme described above. When AMSB has a LOCKed program loaded into memory, it's *not* stored encrypted in RAM. It would be perfectly possible to write BASIC code using direct mode to write the tokenized program out to disk. The program starts at MEMLO and extends up to the first occurrence of three $00 bytes. The hardest part of this would be generating the header using only direct-mode BASIC statements (but it could be done). However... there's no need to do that. AMSB has a flag that tells it whether or not the currently-loaded program is LOCKed. You can just clear the flag: POKE 168,0 Now AMSB won't consider the program LOCKed, and you can SAVE a regular copy of it (and LIST, edit, etc). Line Length Limit ----------------- In the editor, after a POKE 82,0 (to set the left margin to 0), you can enter 120 characters (3 screen lines) on a logical line. If you enter a program line that way *without* a space after the line number, then LIST it, it will be 121 characters long, because AMSB will display a space after the line number. If you use a text editor (or write a program) to create an untokenized BASIC program, you can have a line of code that's 125 characters long. AMSB will accept it just fine, with LOAD. If a line is 126 characters or longer, AMSB will silently ignore that line when LOADing. If you create a 125-character line (with a text editor) consisting only of a comment that begins with ! or ', without a space after the line number, LOAD it, then SAVE it, that line will be 129 bytes long in tokenized form. AMSB will LOAD it with no problems. If you hex-edit a SAVEd file to create a longer line, AMSB will accept that, too... up to 255 bytes. At 256 bytes, AMSB will lock up after LOAD. Crunching --------- AMSB stores spaces in the tokenized program, just like other 8-bit MS BASICs do, but it requires you to put spaces between keywords and variables (unlike e.g. Commodore 64 BASIC). This seems to be because AMSB allows keywords inside of variable names: you can have a variable called LIFE (which contains the keyword IF) in AMSB, but you can't in C=64 BASIC (which gives a syntax error becase it sees "L IF E"). This applies to numbers, too: POKE710,0 is a syntax error in AMSB. This is because POKE710 is actually a valid variable name: try POKE710=123 followed by PRINT POKE710. However. The spaces aren't needed when the program is RUN. It would be possible to remove all the spaces outside of strings or comments and the program would still work fine.