aboutsummaryrefslogtreecommitdiff
path: root/amsb_content.rst
diff options
context:
space:
mode:
Diffstat (limited to 'amsb_content.rst')
-rw-r--r--amsb_content.rst406
1 files changed, 406 insertions, 0 deletions
diff --git a/amsb_content.rst b/amsb_content.rst
new file mode 100644
index 0000000..51cbb25
--- /dev/null
+++ b/amsb_content.rst
@@ -0,0 +1,406 @@
+DESCRIPTION
+===========
+
+Atari Microsoft BASIC is actually a pretty cool BASIC for the Atari 8-bit. I never
+got the chance to use it 'back in the day' because it was expensive,
+required a floppy drive and at least 32K of RAM (my poor 400 had a
+tape drive for the first few years), and then later on, there was
+Turbo BASIC XL, which was cooler than AMSB, and also freeware.
+
+This file is a collection of notes I made to myself while developing
+listamsb. The information here might be useful (e.g. if you're trying
+to repair a damaged AMSB file) and hopefully is interesting. Enjoy!
+
+This file is part of the bw-atari8-utils source. You can get the
+latest version of the source from:
+
+https://slackware.uk/~urchlay/repos/bw-atari8-tools
+
+...which you can either view with a web browser or use with the 'git
+clone' command.
+
+NOTES
+=====
+
+Tokenized file format
+---------------------
+
+File begins with a 3-byte header:
+
+.. csv-table::
+
+ "Offset", "Purpose"
+ "0", "0 for a normal program, 1 for LOCKed (encrypted)"
+ "1", "LSB, program length, not counting the 3-byte header"
+ "2", "MSB, program length"
+
+The program length should always be the actual file size minus 3. If
+it's not, the file has either been truncated or had junk added to the
+end. In a LOCKed program, the program length bytes are not encrypted.
+
+After the header, the lines of code (encrypted, for LOCKed programs).
+Each line has a 4-byte header:
+
+.. csv-table::
+
+ "0", "LSB, address of the last byte of this line..."
+ "1", "MSB, address ...which is ignored on LOAD!"
+ "2", "LSB, line number"
+ "3", "MSB, line number"
+
+The rest of the line is the tokens, terminated by a $00 byte. The
+next 2 bytes after the $00 is the last-byte offset of the next line.
+
+The last "line" of the program has a $0000 offset, which indicates the
+end of the program. Since the actual last line ends with a $00, that
+means there will be three $00 bytes in a row as the last 3 bytes of
+the file. And that's the *only* place 3 $00's in a row will occur.
+
+Tokenization is "lightweight": there are no tokenized numerics,
+they're just stored as ASCII characters, as typed. There's no "string
+constant follows" token like there is in Atari BASIC (well, there is,
+it's just a double-quote, $22. There's no length byte). Variable names
+are not tokenized, either, they're just stored as-is (name in ASCII,
+including trailing $ for strings, etc). Numeric constants are just
+stored as ASCII digits, just as you typed them.
+
+In fact the only things that are tokenized are BASIC keywords:
+commands and functions... NOT including user functions defined
+with DEF (those are stored as just the ASCII function name, like
+variables).
+
+There are 2 sets of tokens. One set is single-byte, $80 and up.
+These are commands. The other set is functions, which are 2 bytes:
+$FF followed by the token number. See amsbtok.h in the source for the
+actual tokens.
+
+AMSB saves the end-of-line pointers, but it totally ignores them
+on LOAD. The SAVEd file format does *not* have a load address (as e.g.
+Commodore BASIC does), so there's no way to know the address of the
+start of the program (other than counting backwards from the next line,
+since its address is known). It's not just a constant either: it
+depends on what MEMLO was set to when the program was saved (which varies
+depending on what version of AMSB you have, what DOS you boot, whether
+or not you have the R: device driver loaded, etc etc).
+
+
+Redundant Tokens
+----------------
+
+There are two separate tokens each for PRINT and AT:
+
+.. csv-table::
+
+ "$ab", "PRINT "
+ "$ac", "PRINT"
+ "$df", "AT("
+ "$e0", "AT "
+
+When tokenizing a line, AMSB will actually use the $ab token if
+there's a space after PRINT (or ?), otherwise it will use the
+$ac token. These lines actually get tokenized differently::
+
+ 10 PRINT "HELLO"
+ 10 PRINT"HELLO"
+
+Same applies to the $df and $e0 AT tokens: if the user entered
+"AT(X,Y)", $df is used. Otherwise, with "AT (X,Y)", $e0 is used
+(followed by an ASCII left parenthesis).
+
+3 tokens include the opening parenthesis:
+
+.. csv-table::
+
+ "$d2", "TAB("
+ "$d6", "SPC("
+ "$df", "AT("
+
+Normally in AMSB, it's OK to leave a space between a function name
+and the left-paren. PEEK (123) and SIN (1) are both valid. However,
+for SPC and TAB, no space is allowed, because the ( is part of the
+token. AT would be the same way, except there's a separate token $e0
+that *includes* the space. Weird, huh? A side effect of this is
+that "SPC (10)" or "TAB (10)" won't be treated as a function call.
+Instead, the SPC or TAB is treated as a variable name. If you write::
+
+ PRINT TAB (10);"HELLO"
+
+...it'll print " 0 HELLO" at the start of the line[*], instead of "HELLO"
+in the 10th column as you might have expected. It also means that AT,
+TAB, and SPC are valid variable names in AMSB, which is an exception
+to the rule that keywords can't be used as variable names (e.g. SIN=1
+or STRING$="HELLO" are invalid).
+
+[*] Unless you've assigned another value to TAB, of couse.
+
+
+Unused Tokens
+-------------
+
+If you look at the token list in amsbtok.h (or in a hex dump
+of the AMSB executable or cartridge image), you'll see a lot of
+double-quotes mixed in with the list. AMSB doesn't actually tokenize
+the " character (it's stored as $22, its ASCII value), so these seem
+to be placeholders, either because some tokens were deleted from the
+language during its development, or else they're intended for some
+future version of AMSB that never happened.
+
+The weird quote tokens are $99, $c8 to $d0, $d5, and $e7 to $ed. If
+you hexedit a program to replace a regular double-quote with one of
+these tokens, it will list as either "" or just one ", but it will
+cause a syntax error at runtime.
+
+
+LOADing Untokenized Files
+-------------------------
+
+If the first byte of the file is anything other than $00 or $01,
+AMSB's LOAD command reads it in as a text file (LISTed rather than
+SAVEd).
+
+When LOAD is reading a text file, if the last byte of the file isn't
+an ATASCII EOL ($9b), you'll get #136 ERROR. The program doesn't get
+deleted, but the last line of the file didn't get loaded. This could
+happen if a LISTed file somehow got truncated.
+
+While on the subject... the manual doesn't mention it, but if you LOAD
+a text file without line numbers, the code gets executed in direct
+mode during the load (like Atari BASIC's ENTER command does). This
+means you could write scripts (batch files) for AMSB... though you'd
+be better off using MERGE, rather than LOAD (MERGE is basically the
+same thing as Atari BASIC's ENTER).
+
+
+Program Length Header Mismatch
+------------------------------
+
+When AMSB's LOAD command executes, it reads the 3-byte header, then
+reads as many bytes as the header's program length says.
+
+If the header length is longer than the rest of the file, you get
+a #136 ERROR (aka Atari's EOF), and the partially loaded program is
+erased (basically it does a NEW).
+
+If the length is shorter than the program, it'll stop loading no
+matter how much more data is in the file. This means it can stop in
+the middle of a line. It also means, if there was already a program in
+memory that was longer than the program length, you get a "hybrid" mix
+of the new program followed by the remainder of the old one. This is
+because the three $00 bytes at the end of the program weren't read in.
+
+If the program length is correct for the actual program (so the three
+$00 bytes get read), but there's extra data appended to the file, AMSB
+will never read the extra data at all.
+
+
+String Limitations
+------------------
+
+String literals in AMSB cannot contain the | or ATASCII heart
+characters.
+
+AMSB uses | as a terminator for quoted strings, e.g. "STRING" will
+be tokenized as: "STRING|
+
+If you try to use a | in a quoted string, it gets turned into a double
+quote: "FOO|BAR" comes out as "FOO"BAR which is a syntax error!
+
+String variables can store | but only with e.g. CHR$(124) or reading
+from a file: it's string *literals* that don't allow it.
+
+The reason | is used for a terminating quote is to allow doubling up
+the quotes to embed them in a string::
+
+ A$ = "HAS ""QUOTES"""
+
+PRINT A$ will print: HAS "QUOTES"
+
+At first I thought "no pipe characters in strings, WTF man?" but it's
+probably no worse than Atari BASIC's "no quotes in strings constants"
+rule. It *would* be nice if the AMSB manual actually documented the
+fact that | can't occur in a string constant. Not documenting it makes
+it a bug... and they have unused tokens in the $Fx range, I don't see
+why they had to use a printing character for this.
+
+You also can't put a heart (ATASCII character 0) in a string
+literal. It will be treated as the end of the line, as though you
+pressed Enter (and anything else on the line is ignored). This isn't
+documented in the manual, either.
+
+Like the | character, you can use CHR$(0) to store a heart in a string
+and it will work correctly.
+
+
+Line Number Range
+-----------------
+
+AMSB doesn't allow entering line numbers above 63999, but if a file
+is e.g. hex-edited to have a line number that's out of range, it will
+LIST and RUN just fine... except that it's impossible to GOTO or GOSUB
+to an out-of-range line. It will still execute if program flow falls
+into it.
+
+
+Differences Between Versions
+----------------------------
+
+The language is the same in AMSB versions 1 and 2. Tokenized files
+made by one version will LOAD and RUN in the other version.
+
+Version 1, the disk version, always has the full set of commands
+avaiable. Version 2, the cart, only has the full set if the extension
+disk is booted. The missing ones still get tokenized, but you get SN
+ERROR at runtime if you try to execute them. This doesn't affect the
+detokenizer at all. The missing commands::
+
+ AUTO
+ DEF (string version only)
+ NOTE
+ RENUM
+ TRON
+ TROFF
+ DEL
+ USING
+ STRING$ (function)
+
+RENUM only works in direct mode, not a program. Executing it
+gives a FUNCTION CALL ERROR.
+
+AUTO is (oddly) allowed in a program. Executing it exits the program
+and puts you back in the editor, in auto-numbering mode.
+
+It would seem weird to have POINT available but not NOTE... except
+that AMSB doesn't even *have* POINT. Instead, the disk addresses
+returned by NOTE are used with AT() in a PRINT statement. Not sure
+if AT() works without the extensions loaded, but it won't be useful
+anyway without NOTE.
+
+One other difference between versions 1 and 2: version 2 will LOAD and
+RUN the file D:AUTORUN.AMB at startup, if it exists.
+
+
+Colon Weirdness
+---------------
+
+AMSB allows comments to be started with the ! and ' characters (as
+well as the traditional REM). For the ! and ' variety, if they
+come at the end of a line after some code, you don't have to put a colon.
+Example::
+
+ 10 GRAPHICS 2+16 ! NO TEXT
+
+However... in the tokenized format, there *is* a tokenized colon
+just before the tokenized ! or ' character. LIST doesn't display it.
+If you did put a colon::
+
+ 10 CLOSE #1:! WE'RE DONE
+
+...then there will be *two* colons in the tokenized file, and only
+one will be LISTed.
+
+The ELSE keyword works the same way. In this line::
+
+ 10 IF A THEN PRINT ELSE STOP
+
+...there is actually a : character just before the token for ELSE.
+
+Even weirder: you can put as many colons in a row as you like, and
+AMSB will treat it like single colon. This line of code is valid
+and runs correctly::
+
+ 10 PRINT "A"::::::PRINT "A"
+
+These colons are displayed normally in LIST output.
+
+
+Memory Usage
+------------
+
+On a 48K/64K Atari, FRE(0) for AMSB 1 with DOS booted (since you can't
+use it without) but no device drivers is 21020. MEMLO is awfully high
+($6a00).
+
+For AMSB 2 with DOS booted, but without the extensions loaded, FRE(0)
+is 24352. With extensions it's 20642 (even though the banner says 20644
+BYTES FREE).
+
+AMSB 2 without DOS gives you 29980, but how are you gonna load or save
+programs without DOS? Nobody wants to use cassette, especially not
+people who could afford to buy the AMSB II cartridge.
+
+
+LOCKed Programs
+---------------
+
+If you save a program with SAVE "filename" LOCK, it gets saved in an
+"encrypted" form. Loading a locked program disables the LISTing or
+editing the program (you get LK ERROR if you try).
+
+The "encryption" is no better than ROT13. To encrypt, subtract each
+byte from 0x54 (in an 8-bit register, using twos complement). To
+decrypt, do the same. This is a reciprocal cipher, and you can think
+of it as the binary equivalent of ROT13.
+
+You can tell a LOCKed program because its first byte will be 1 instead
+of 0. The next 2 bytes (the program length) unencrypted. The rest of
+the file is encrypted with the lame scheme described above.
+
+When AMSB has a LOCKed program loaded into memory, it's *not* stored
+encrypted in RAM. It would be perfectly possible to write BASIC code
+using direct mode to write the tokenized program out to disk. The
+program starts at MEMLO and extends up to the first occurrence of
+three $00 bytes. The hardest part of this would be generating the
+header using only direct-mode BASIC statements (but it could be done).
+
+However... there's no need to do that. AMSB has a flag that tells it
+whether or not the currently-loaded program is LOCKed. You can just
+clear the flag::
+
+ POKE 168,0
+
+Now AMSB won't consider the program LOCKed, and you can SAVE a regular
+copy of it (and LIST, edit, etc).
+
+
+Line Length Limit
+-----------------
+
+In the editor, after a POKE 82,0 (to set the left margin to 0), you
+can enter 120 characters (3 screen lines) on a logical line. If you
+enter a program line that way *without* a space after the line number,
+then LIST it, it will be 121 characters long, because AMSB will
+display a space after the line number.
+
+If you use a text editor (or write a program) to create an untokenized
+BASIC program, you can have a line of code that's 125 characters
+long. AMSB will accept it just fine, with LOAD. If a line is 126
+characters or longer, AMSB will silently ignore that line when
+LOADing.
+
+If you create a 125-character line (with a text editor) consisting
+only of a comment that begins with ! or ', without a space after the
+line number, LOAD it, then SAVE it, that line will be 129 bytes long
+in tokenized form. AMSB will LOAD it with no problems.
+
+If you hex-edit a SAVEd file to create a longer line, AMSB will
+accept that, too... up to 255 bytes. At 256 bytes, AMSB will lock
+up after LOAD.
+
+
+Crunching
+---------
+
+AMSB stores spaces in the tokenized program, just like other 8-bit
+MS BASICs do, but it requires you to put spaces between keywords and
+variables (unlike e.g. Commodore 64 BASIC). This seems to be because
+AMSB allows keywords inside of variable names: you can have a variable
+called LIFE (which contains the keyword IF) in AMSB, but you can't in
+C=64 BASIC (which gives a syntax error becase it sees "L IF E").
+
+This applies to numbers, too: POKE710,0 is a syntax error in
+AMSB. This is because POKE710 is actually a valid variable name: try
+POKE710=123 followed by PRINT POKE710.
+
+However. The spaces aren't needed when the program is RUN. It would be
+possible to remove all the spaces outside of strings or comments and
+the program would still work fine.