aboutsummaryrefslogtreecommitdiff
path: root/unprotbas.rst
diff options
context:
space:
mode:
authorB. Watson <urchlay@slackware.uk>2024-05-17 05:09:45 -0400
committerB. Watson <urchlay@slackware.uk>2024-05-17 05:09:45 -0400
commit96af9bc891987f6fcc560a6e403c5ada541d8699 (patch)
tree6bdd20a1fdd7f31316d14fb5e233718b48713522 /unprotbas.rst
parentd4064b55a7ddbb002ef80dbc0db60cd0d95cb1cd (diff)
downloadbw-atari8-tools-96af9bc891987f6fcc560a6e403c5ada541d8699.tar.gz
unprotbas: added; blob2xex: tweak docs.
Diffstat (limited to 'unprotbas.rst')
-rw-r--r--unprotbas.rst156
1 files changed, 156 insertions, 0 deletions
diff --git a/unprotbas.rst b/unprotbas.rst
new file mode 100644
index 0000000..735681e
--- /dev/null
+++ b/unprotbas.rst
@@ -0,0 +1,156 @@
+=========
+unprotbas
+=========
+
+---------------------------------------------------
+Unprotect LIST-protected Atari 8-bit BASIC programs
+---------------------------------------------------
+
+.. include:: manhdr.rst
+
+SYNOPSIS
+========
+
+unprotbas [**-v**] [**-f**] [**-n**] [**-g**] **input-file** **output-file**
+
+DESCRIPTION
+===========
+
+**unprotbas** modifies LIST-protected Atari 8-bit BASIC programs,
+creating a new non-protected copy. See **DETAILS**, below, to
+understand how the protection and unprotection works.
+
+**input-file** must be a tokenized Atari BASIC program. Use *-* to
+read from standard input.
+
+**output-file** will be the unprotected tokenized BASIC program. If it
+already exists, it will be overwritten. Use *-* to write to standard
+output, but **[TODO]** **unprotbas** will refuse to write to standard
+output if it's a terminal (since tokenized BASIC is binary data and
+may confuse the terminal).
+
+OPTIONS
+=======
+
+**-v**
+ Verbose operation.
+
+**-f**
+ Force the variable name table to be rebuilt, even if it looks OK.
+
+**-n**
+ Don't rebuild the variable table (only fix the line pointers, if
+ needed).
+
+**-g**
+ Remove any "garbage" data from the end of the file. By default,
+ it's left as-is, in case it's actually data used by the program.
+
+EXIT STATUS
+===========
+
+Exit status is zero for success, non-zero for failure.
+
+DETAILS
+=======
+
+In the Atari BASIC world, it's possible to create a SAVEd (tokenized)
+program that can be RUN from disk (**RUN "D:FILE.BAS"**) but if
+it's LOADed, it will either crash the BASIC interpreter, or LIST
+as gibberish. This is known as LIST-protection. Such programs are
+generally released to the world in protected form; the author
+privately keeps an unprotected copy so he can modify it. In
+later days, collections such as the Holmes Archive contain many
+LIST-protected programs, for which the unprotected version was never
+released.
+
+One example of LIST-protection, taken from *Mapping the Atari* (the
+**STMCUR** entry in the memory map) looks like::
+
+ 32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI
+ 32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW
+
+To use, add the 2 lines of code to your program, then execute them
+with **GOTO 32000** in immediate mode.
+
+This illustrates both types of protection, which can be (and usually
+are) applied to the same program:
+
+Variable name table scrambling
+ BASIC has specific rules on what are and aren't considered legal
+ variable names, which are enforced by the tokenization process,
+ at program entry time. However, it doesn't use the variable names
+ at runtime, when the tokenized file is interpreted.
+
+ Replacing the variable names with binary gibberish will render the
+ program LIST-proof, either replacing every variable name with the
+ same control character, or causing LIST to display a long string of
+ binary garbage for each variable name... but the program will still
+ RUN correctly. Note that the original variable names are *gone*,
+ and cannot be recovered.
+
+ Line 32000 in the example above does this job, replacing every
+ variable name with the EOL character (155).
+
+ **unprotbas** detects a scrambled variable name table, and builds
+ a new one that's valid. However, since there are no real variable
+ names in the program, the recovery process just invents new ones,
+ named A through Z, A1 through A9, B1 through B9, etc, etc. It'll
+ require human intelligence to figure out what each variable is for,
+ since the names are meaningless.
+
+ The **output-file** may be larger than the **input-file** was, since
+ some types of variable-name scrambling shrink the variable name
+ table to the minimum size (one byte per name); the rebuilt table
+ will be larger.
+
+Bad next-line pointer
+ Generally, this is done with line number 32768. Yes, this line
+ number is outside the range BASIC accepts... but BASIC uses it
+ internally for immediate-mode commands. And when SAVE or CSAVE are
+ executed, this line gets saved, too.
+
+ Every line of tokenized BASIC contains a line length byte, which
+ BASIC uses as a pointer to the next line of code. Before printing
+ the READY prompt, BASIC iterates over every line of code in the
+ program, using the next-line pointers, in order to delete any
+ existing line 32768 (the previous immediate mode command). If any
+ line's pointer is set to zero, that means it points to itself.
+
+ When BASIC tries to traverse a line of code that points to itself as
+ "next" line, it will get stuck in an infinite loop. This not only
+ prevents LIST, it actually prevents any immediate mode command:
+ after LOADing such a file, *nothing* will work (even pressing RESET
+ won't get you out of it). The only way to use such a program is to
+ use the RUN command with a filename, and if the program ever exits
+ (due to END, STOP, an error, or the Break key), BASIC will get stuck
+ again.
+
+ This doesn't *have* to be done with line 32768. Any line of code
+ that doesn't have to be traversed at runtime would work (in other
+ words, a regular line whose line number is higher than any code that
+ ever gets executed, usually the last line in the file).
+
+ Line 32100 in the example above does this job, taking advantage of
+ the STMCUR pointer used by BASIC, which holds the address of the
+ line of tokenized code currently being executed.
+
+ **unprotbas** fixes this simply by calculating what the pointer
+ should be (based on the tokens in the line) and changing it. No
+ information is lost by doing this.
+
+One more thing **unprotbas** can do is remove extra data from the end
+of the file. It's possible for BASIC files to contain extra data that
+occurs after the end of the program. Some programs use this as a way
+to load arbitrary binary data into memory along with the program; for
+other programs, the extra data is truly garbage (e.g. an EOF character
+if the file came from a CP/M system, or padding to a block size if a
+dumb implementation of XMODEM was used to transfer the file).
+
+Normally, such "garbage" doesn't hurt anything. BASIC ignores it. Or
+it normally does... if you suspect it's causing a problem, you can
+remove it with the **-g** option. If removing the "garbage" causes the
+program to fail to run, it wasn't garbage! **unprotbas** doesn't
+remove extra data by default, to be on the safe side.
+
+.. include:: manftr.rst