aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorB. Watson <urchlay@slackware.uk>2024-05-17 05:09:45 -0400
committerB. Watson <urchlay@slackware.uk>2024-05-17 05:09:45 -0400
commit96af9bc891987f6fcc560a6e403c5ada541d8699 (patch)
tree6bdd20a1fdd7f31316d14fb5e233718b48713522
parentd4064b55a7ddbb002ef80dbc0db60cd0d95cb1cd (diff)
downloadbw-atari8-tools-96af9bc891987f6fcc560a6e403c5ada541d8699.tar.gz
unprotbas: added; blob2xex: tweak docs.
-rw-r--r--Makefile4
-rw-r--r--blob2xex.112
-rw-r--r--blob2xex.rst10
-rw-r--r--unprotbas.1213
-rw-r--r--unprotbas.c429
-rw-r--r--unprotbas.rst156
6 files changed, 817 insertions, 7 deletions
diff --git a/Makefile b/Makefile
index 67767a0..8aa1acc 100644
--- a/Makefile
+++ b/Makefile
@@ -16,9 +16,9 @@ CC=gcc
CFLAGS=-Wall $(COPT) -ansi -D_GNU_SOURCE -DVERSION=\"$(VERSION)\"
# BINS and SCRIPTS go in $BINDIR, DOCS go in $DOCDIR
-BINS=a8eol xfd2atr atr2xfd blob2c cart2xex fenders xexsplit xexcat atrsize rom2cart unmac65 axe blob2xex xexamine xex1to2
+BINS=a8eol xfd2atr atr2xfd blob2c cart2xex fenders xexsplit xexcat atrsize rom2cart unmac65 axe blob2xex xexamine xex1to2 unprotbas
SCRIPTS=dasm2atasm a8utf8
-MANS=a8eol.1 xfd2atr.1 atr2xfd.1 blob2c.1 cart2xex.1 fenders.1 xexsplit.1 xexcat.1 atrsize.1 rom2cart.1 unmac65.1 axe.1 dasm2atasm.1 a8utf8.1 blob2xex.1 xexamine.1 xex1to2.1
+MANS=a8eol.1 xfd2atr.1 atr2xfd.1 blob2c.1 cart2xex.1 fenders.1 xexsplit.1 xexcat.1 atrsize.1 rom2cart.1 unmac65.1 axe.1 dasm2atasm.1 a8utf8.1 blob2xex.1 xexamine.1 xex1to2.1 unprotbas.1
MAN5S=xex.5
MAN7S=atascii.7
DOCS=README equates.inc *.dasm LICENSE ksiders/atr.txt
diff --git a/blob2xex.1 b/blob2xex.1
index 77b15b4..a13f982 100644
--- a/blob2xex.1
+++ b/blob2xex.1
@@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
-.TH "BLOB2XEX" 1 "2024-05-16" "0.2.1" "Urchlay's Atari 8-bit Tools"
+.TH "BLOB2XEX" 1 "2024-05-17" "0.2.1" "Urchlay's Atari 8-bit Tools"
.SH NAME
blob2xex \- Create Atari 8-bit executables from arbitrary data
.\" RST source for blob2xex(1) man page. Convert with:
@@ -69,6 +69,10 @@ filename, anyway?
.sp
Addresses, offsets, and sizes may be given in decimal or hex. Hex
addresses must be prefixed with either \fB$\fP or \fB0x\fP\&.
+.sp
+It\(aqs impossible to create a segment that would wrap around the Atari\(aqs
+64KB address space. Once address \fB$FFFF\fP is reached, no more data is
+read from the input file.
.SH OPTIONS
.sp
A space is required between an option and its argument; use e.g. \fB\-l 0x2000\fP,
@@ -127,11 +131,13 @@ created. There are only a few possible warnings:
.INDENT 0.0
.TP
.B start/end address XXXX loads into ROM.
-This means your .exe file\(aqs start/end addresses will load the
+This means your .xex file\(aqs start/end addresses will load the
file into ROM (or the unmapped area at \fB$C000\fP on a 400/800).
Normally this means the .xex file won\(aqt load properly on the
Atari, but feel free to ignore this warning if you know exactly
-what you\(aqre doing.
+what you\(aqre doing. Example: if your .xex file is intended to
+be loaded on an 800 with an Axlon memory upgrade, mapped at
+\fB$C000\fP, this warning can be ignored.
.TP
.B extra arguments after last input file ignored.
You gave at least one option that would affect the next file,
diff --git a/blob2xex.rst b/blob2xex.rst
index 0921886..a35bf65 100644
--- a/blob2xex.rst
+++ b/blob2xex.rst
@@ -50,6 +50,10 @@ filename, anyway?
Addresses, offsets, and sizes may be given in decimal or hex. Hex
addresses must be prefixed with either **$** or **0x**.
+It's impossible to create a segment that would wrap around the Atari's
+64KB address space. Once address **$FFFF** is reached, no more data is
+read from the input file.
+
OPTIONS
=======
@@ -104,11 +108,13 @@ Messages containing *warning* are non-fatal, and the output file is
created. There are only a few possible warnings:
start/end address XXXX loads into ROM.
- This means your .exe file's start/end addresses will load the
+ This means your .xex file's start/end addresses will load the
file into ROM (or the unmapped area at **$C000** on a 400/800).
Normally this means the .xex file won't load properly on the
Atari, but feel free to ignore this warning if you know exactly
- what you're doing.
+ what you're doing. Example: if your .xex file is intended to
+ be loaded on an 800 with an Axlon memory upgrade, mapped at
+ **$C000**, this warning can be ignored.
extra arguments after last input file ignored.
You gave at least one option that would affect the next file,
diff --git a/unprotbas.1 b/unprotbas.1
new file mode 100644
index 0000000..92e6b66
--- /dev/null
+++ b/unprotbas.1
@@ -0,0 +1,213 @@
+.\" Man page generated from reStructuredText.
+.
+.
+.nr rst2man-indent-level 0
+.
+.de1 rstReportMargin
+\\$1 \\n[an-margin]
+level \\n[rst2man-indent-level]
+level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
+-
+\\n[rst2man-indent0]
+\\n[rst2man-indent1]
+\\n[rst2man-indent2]
+..
+.de1 INDENT
+.\" .rstReportMargin pre:
+. RS \\$1
+. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
+. nr rst2man-indent-level +1
+.\" .rstReportMargin post:
+..
+.de UNINDENT
+. RE
+.\" indent \\n[an-margin]
+.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
+.nr rst2man-indent-level -1
+.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
+.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
+..
+.TH "UNPROTBAS" 1 "2024-05-17" "0.2.1" "Urchlay's Atari 8-bit Tools"
+.SH NAME
+unprotbas \- Unprotect LIST-protected Atari 8-bit BASIC programs
+.SH SYNOPSIS
+.sp
+unprotbas [\fB\-v\fP] [\fB\-f\fP] [\fB\-n\fP] [\fB\-g\fP] \fBinput\-file\fP \fBoutput\-file\fP
+.SH DESCRIPTION
+.sp
+\fBunprotbas\fP modifies LIST\-protected Atari 8\-bit BASIC programs,
+creating a new non\-protected copy. See \fBDETAILS\fP, below, to
+understand how the protection and unprotection works.
+.sp
+\fBinput\-file\fP must be a tokenized Atari BASIC program. Use \fI\-\fP to
+read from standard input.
+.sp
+\fBoutput\-file\fP will be the unprotected tokenized BASIC program. If it
+already exists, it will be overwritten. Use \fI\-\fP to write to standard
+output, but \fB[TODO]\fP \fBunprotbas\fP will refuse to write to standard
+output if it\(aqs a terminal (since tokenized BASIC is binary data and
+may confuse the terminal).
+.SH OPTIONS
+.INDENT 0.0
+.TP
+.B \fB\-v\fP
+Verbose operation.
+.TP
+.B \fB\-f\fP
+Force the variable name table to be rebuilt, even if it looks OK.
+.TP
+.B \fB\-n\fP
+Don\(aqt rebuild the variable table (only fix the line pointers, if
+needed).
+.TP
+.B \fB\-g\fP
+Remove any "garbage" data from the end of the file. By default,
+it\(aqs left as\-is, in case it\(aqs actually data used by the program.
+.UNINDENT
+.SH EXIT STATUS
+.sp
+Exit status is zero for success, non\-zero for failure.
+.SH DETAILS
+.sp
+In the Atari BASIC world, it\(aqs possible to create a SAVEd (tokenized)
+program that can be RUN from disk (\fBRUN "D:FILE.BAS"\fP) but if
+it\(aqs LOADed, it will either crash the BASIC interpreter, or LIST
+as gibberish. This is known as LIST\-protection. Such programs are
+generally released to the world in protected form; the author
+privately keeps an unprotected copy so he can modify it. In
+later days, collections such as the Holmes Archive contain many
+LIST\-protected programs, for which the unprotected version was never
+released.
+.sp
+One example of LIST\-protection, taken from \fIMapping the Atari\fP (the
+\fBSTMCUR\fP entry in the memory map) looks like:
+.INDENT 0.0
+.INDENT 3.5
+.sp
+.nf
+.ft C
+32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI
+32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW
+.ft P
+.fi
+.UNINDENT
+.UNINDENT
+.sp
+To use, add the 2 lines of code to your program, then execute them
+with \fBGOTO 32000\fP in immediate mode.
+.sp
+This illustrates both types of protection, which can be (and usually
+are) applied to the same program:
+.INDENT 0.0
+.TP
+.B Variable name table scrambling
+BASIC has specific rules on what are and aren\(aqt considered legal
+variable names, which are enforced by the tokenization process,
+at program entry time. However, it doesn\(aqt use the variable names
+at runtime, when the tokenized file is interpreted.
+.sp
+Replacing the variable names with binary gibberish will render the
+program LIST\-proof, either replacing every variable name with the
+same control character, or causing LIST to display a long string of
+binary garbage for each variable name... but the program will still
+RUN correctly. Note that the original variable names are \fIgone\fP,
+and cannot be recovered.
+.sp
+Line 32000 in the example above does this job, replacing every
+variable name with the EOL character (155).
+.sp
+\fBunprotbas\fP detects a scrambled variable name table, and builds
+a new one that\(aqs valid. However, since there are no real variable
+names in the program, the recovery process just invents new ones,
+named A through Z, A1 through A9, B1 through B9, etc, etc. It\(aqll
+require human intelligence to figure out what each variable is for,
+since the names are meaningless.
+.sp
+The \fBoutput\-file\fP may be larger than the \fBinput\-file\fP was, since
+some types of variable\-name scrambling shrink the variable name
+table to the minimum size (one byte per name); the rebuilt table
+will be larger.
+.TP
+.B Bad next\-line pointer
+Generally, this is done with line number 32768. Yes, this line
+number is outside the range BASIC accepts... but BASIC uses it
+internally for immediate\-mode commands. And when SAVE or CSAVE are
+executed, this line gets saved, too.
+.sp
+Every line of tokenized BASIC contains a line length byte, which
+BASIC uses as a pointer to the next line of code. Before printing
+the READY prompt, BASIC iterates over every line of code in the
+program, using the next\-line pointers, in order to delete any
+existing line 32768 (the previous immediate mode command). If any
+line\(aqs pointer is set to zero, that means it points to itself.
+.sp
+When BASIC tries to traverse a line of code that points to itself as
+"next" line, it will get stuck in an infinite loop. This not only
+prevents LIST, it actually prevents any immediate mode command:
+after LOADing such a file, \fInothing\fP will work (even pressing RESET
+won\(aqt get you out of it). The only way to use such a program is to
+use the RUN command with a filename, and if the program ever exits
+(due to END, STOP, an error, or the Break key), BASIC will get stuck
+again.
+.sp
+This doesn\(aqt \fIhave\fP to be done with line 32768. Any line of code
+that doesn\(aqt have to be traversed at runtime would work (in other
+words, a regular line whose line number is higher than any code that
+ever gets executed, usually the last line in the file).
+.sp
+Line 32100 in the example above does this job, taking advantage of
+the STMCUR pointer used by BASIC, which holds the address of the
+line of tokenized code currently being executed.
+.sp
+\fBunprotbas\fP fixes this simply by calculating what the pointer
+should be (based on the tokens in the line) and changing it. No
+information is lost by doing this.
+.UNINDENT
+.sp
+One more thing \fBunprotbas\fP can do is remove extra data from the end
+of the file. It\(aqs possible for BASIC files to contain extra data that
+occurs after the end of the program. Some programs use this as a way
+to load arbitrary binary data into memory along with the program; for
+other programs, the extra data is truly garbage (e.g. an EOF character
+if the file came from a CP/M system, or padding to a block size if a
+dumb implementation of XMODEM was used to transfer the file).
+.sp
+Normally, such "garbage" doesn\(aqt hurt anything. BASIC ignores it. Or
+it normally does... if you suspect it\(aqs causing a problem, you can
+remove it with the \fB\-g\fP option. If removing the "garbage" causes the
+program to fail to run, it wasn\(aqt garbage! \fBunprotbas\fP doesn\(aqt
+remove extra data by default, to be on the safe side.
+.SH COPYRIGHT
+.sp
+WTFPL. See \fI\%http://www.wtfpl.net/txt/copying/\fP for details.
+.SH AUTHOR
+.INDENT 0.0
+.IP B. 3
+Watson <\fI\%urchlay@slackware.uk\fP>; Urchlay on irc.libera.chat \fI##atari\fP\&.
+.UNINDENT
+.SH SEE ALSO
+.sp
+\fBa8eol\fP(1),
+\fBa8utf8\fP(1),
+\fBatr2xfd\fP(1),
+\fBatrsize\fP(1),
+\fBaxe\fP(1),
+\fBblob2c\fP(1),
+\fBblob2xex\fP(1),
+\fBcart2xex\fP(1),
+\fBdasm2atasm\fP(1),
+\fBf2toxex\fP(1),
+\fBfenders\fP(1),
+\fBrom2cart\fP(1),
+\fBunmac65\fP(1),
+\fBxexamine\fP(1),
+\fBxexcat\fP(1),
+\fBxexsplit\fP(1),
+\fBxfd2atr\fP(1),
+\fBxex\fP(5),
+\fBatascii\fP(7).
+.sp
+Any good Atari 8\-bit book: \fIDe Re Atari\fP, \fIThe Atari BASIC Reference
+Manual\fP, the \fIOS Users\(aq Guide\fP, \fIMapping the Atari\fP, etc.
+.\" Generated by docutils manpage writer.
+.
diff --git a/unprotbas.c b/unprotbas.c
new file mode 100644
index 0000000..9c45fbb
--- /dev/null
+++ b/unprotbas.c
@@ -0,0 +1,429 @@
+/**** TODO:
+ if the rebuilt variable name table ends up larger than the
+ scrambled one, the rest of the program needs to be moved upwards
+ in memory to make room for it. currently this isn't done, so
+ the variable *value* table gets corrupted by the last few
+ variable names overwriting the first few values. */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+/* attempt to fix a "list-protected" Atari 8-bit BASIC program.
+ we don't fully detokenize, so this won't fix truly corrupted
+ files.
+
+ the "fix" is in 2 parts:
+ 1. fix any invalid (0-byte) offsets after a line number. this is
+ what causes BASIC to lock up.
+ 2. if the variable names were overwritten (e.g. with EOL characters,
+ or whatever), we "fix" that by making up new variable names.
+*/
+
+#define STM_OFFSET 0xf2
+
+/* entire file gets read into memory (for now) */
+unsigned char data[65536];
+
+/* BASIC 14-byte header values */
+unsigned short lomem;
+unsigned short vntp;
+unsigned short vntd;
+unsigned short vvtp;
+unsigned short stmtab;
+unsigned short stmcur;
+unsigned short starp;
+
+/* positions where various parts of the file start,
+ derived from the header vars above. */
+unsigned short codestart;
+unsigned short vnstart;
+unsigned short vvstart;
+int filelen;
+
+/* name of executable, taken from argv[0] */
+char *self;
+
+/* these are set by the various command-line switches */
+int keepvars = 0;
+int forcevars = 0;
+int keepgarbage = 1;
+int verbose = 0;
+
+/* file handles */
+FILE *input_file = NULL;
+FILE *output_file = NULL;
+
+void die(const char *msg) {
+ fprintf(stderr, "%s: %s\n", self, msg);
+ exit(1);
+}
+
+/* read entire file into memory */
+int readfile(void) {
+ int got = fread(data, 1, 65535, input_file);
+ fprintf(stderr, "read %d bytes\n", got);
+ return got;
+}
+
+/* get a 16-bit value from the file, in 6502 LSB/MSB order. */
+unsigned short getword(int addr) {
+ return data[addr] | (data[addr + 1] << 8);
+}
+
+/* fixline() calculates & sets correct line length, by iterating
+ over the statement(s) within the line. the last statement's
+ offset will be the same as the line offset should have been,
+ if it weren't zeroed. when reading this code, it's helpful to
+ know that the lengths (line and statement) are counted from the
+ start of the line in memory.
+
+ A line with only a line number and one token (such as END) would have a
+ line length of 6: 2 for the 16-bit line number, 1 for the length byte
+ itself, 1 for the statement length byte (also 6), 1 for the END token, and one
+ for the end-of-line token.
+
+ A line with two statements: 10 ?:END
+ offset value meaning
+ 0 0A line number (low byte)
+ 1 00 line number (high byte)
+ 2 09 line length (or, offset to next line) [!]
+ 3 06 offset to next statement *from the start of the line*
+ 4 28 token for "?"
+ 5 14 token for : (end of statement)
+ 6 09 offset to next statement [!]
+ 7 15 token for END
+ 8 16 token for end-of-line [*]
+ 9 ?? (line number of next statement)
+
+ Note the values marked with [!] are equal.
+
+ [*] end-of-line is $16 *except* for REM and DATA, which are
+ terminated with $9B instead.
+*/
+int fixline(int linepos) {
+ /* +3 here to skip the line number + line length */
+ int token, done = 0, offset = data[linepos + 3];
+
+ while(!done) {
+ offset = data[linepos + offset];
+ token = data[linepos + offset - 1];
+ fprintf(stderr, "offset %02x token %02x\n", offset, token);
+ if(token != 0x14)
+ done++;
+ }
+
+ data[linepos + 2] = offset;
+ return offset;
+}
+
+/* Iterate over all the tokenized lines. If any of them have invalid
+ line lengths (<=5), call fixline() on them. */
+int fixcode(void) {
+ int result = 0;
+ int pos = codestart;
+ int offset, lineno = -1, tmpno;
+
+ while(pos < filelen) {
+ tmpno = getword(pos);
+ if(tmpno <= lineno) {
+ fprintf(stderr, "Warning: line number %d at offset %04x is <= previous line number %d\n",
+ tmpno, pos, lineno);
+ }
+ lineno = tmpno;
+
+ offset = data[pos + 2];
+ /* fprintf(stderr, "pos %d, line #%d, offset %d\n", pos, lineno, offset); */
+ if(offset < 6) {
+ fprintf(stderr, "Found invalid offset %d (<6) at line %d\n", offset, lineno);
+ offset += fixline(pos);
+ result++;
+ }
+ pos += offset;
+
+ /* Atari BASIC tolerates garbage after the last tokenized line,
+ so we must do likewise. */
+ if(lineno == 32768) break;
+ }
+
+ fprintf(stderr, "End program pos %04x/%d\n", pos, pos);
+
+ if(filelen > pos) {
+ fprintf(stderr, "trailing garbage at EOF, %d bytes, %s\n",
+ filelen - pos, (keepgarbage ? "keeping" : "removing"));
+ if(!keepgarbage) filelen = pos;
+ }
+
+ return result;
+}
+
+/* Fixing the variables is a bit more work than it seems like
+ it might be, because the last byte of the name has to match
+ the type (inverse video "(" for numeric array, inverse "$" for
+ string, inverse last character of name for scalars). To do
+ this right, we have to examine the variable value table to
+ find out the type of each variable.
+
+ Each variable type get assigned A to Z, then A1 to A9, B1 to B9,
+ etc. This means there will be A, A$, and A( variables, which might
+ be a bit confusing, but we have to keep the generated name table as
+ short as possible, because we can't extend the size of the table in
+ the file.
+
+ We can find the actual table size in the file by subtracting VNTP
+ (start of variable name table) from VNTD (end of variable name table),
+ and if we run out of space for the generated names, something is
+ seriously off...
+
+ The maximum number of variable names is 128. If all 128 vars are in
+ use, the minimum table size is 230 (26 one-letter names, 102 2-letter
+ or letter+number or one-letter string/array names).
+
+*/
+
+int fixvars(void) {
+ int vp = vnstart, vv = vvstart;
+ int strings = 0, arrays = 0, scalars = 0, varname = 0, varnum = 0;
+ int bad = 0;
+
+ /* See if the variables even need fixing.
+
+ This code is simpler than it should be: it checks that all
+ characters in the variable name table are valid, but doesn't
+ check that they're in valid sequences. Example: a variable name
+ that's just an inverse dollar sign would be considered OK).
+ Also multiple variables of the same type with the same name
+ would be OK.
+
+ However, if all the bytes are the same value, even if it's a
+ valid character, that's correctly detected as invalid.
+ */
+
+ if(vntp == vntd) {
+ fprintf(stderr, "No variables\n");
+ return 0;
+ }
+
+ vp = vnstart + 1;
+ bad = 1;
+ while(vp < vvstart - 1) {
+ if(data[vp] != data[vnstart]) bad = 0;
+ vp++;
+ }
+
+ vp = vnstart;
+ while(vp < vvstart) {
+ unsigned char c = data[vp];
+ fprintf(stderr, "%04x/%04x: %04x\n", vp, vvstart, c);
+
+ /* allow a null byte only at the end of the table! */
+ /* if(c == 0 && vp == vvstart - 1) break; */
+ /* new rule: treat a null byte as end-of-table, ignore any
+ junk between it and VNTP. */
+ if(c == 0) break;
+
+ vp++;
+
+ /* inverse $ or ( is OK */
+ if(c == 0xa4 || c == 0xa8) continue;
+
+ /* numbers and letters are allowed, inverse or normal. */
+ c &= 0x7f;
+ if(c >= 0x30 && c <= 0x39) continue;
+ if(c >= 0x41 && c <= 0x5a) continue;
+
+ bad++;
+ break;
+ }
+ if(!forcevars && !bad) return 0;
+
+ vp = vnstart;
+ while(vv < codestart) {
+ unsigned char sigil = 0;
+ /* type: scalar = 0, array = 1, string = 2 */
+ unsigned char type = data[vv] >> 6;
+ /* fprintf(stderr, "%04x: %04x, %d\n", vv, data[vv], type); */
+
+ if(varnum != data[vv+1]) {
+ fprintf(stderr, "Warning: variable value is corrupt!\n");
+ }
+ varnum++;
+
+ switch(type) {
+ case 1: varname = arrays++; sigil = 0xa8; break;
+ case 2: varname = strings++; sigil = 0xa4; break;
+ default: varname = scalars++; break;
+ }
+
+ if(varname < 26) {
+ data[vp] = ('A' + varname);
+ } else {
+ varname -= 26;
+ data[vp++] = 'A' + (varname / 9);
+ data[vp] = ('1' + (varname % 9));
+ }
+
+ if(sigil) {
+ vp++;
+ data[vp++] = sigil;
+ } else {
+ data[vp] |= 0x80;
+ vp++;
+ }
+
+ vv += 8;
+ }
+
+ /* there's supposed to be a null byte at the end of the table, unless
+ all 128 table slots are used. */
+ if(varnum < 128) data[vp] = 0;
+
+ /* fixup the VNTD pointer */
+ vntd = vntp + (vp - vnstart);
+ data[4] = vntd & 0xff;
+ data[5] = vntd >> 8;
+
+ fprintf(stderr, "%d variables, VNTD adjusted to %04x\n", varnum, vntd);
+ return 1;
+}
+
+void print_help(void) {
+ fprintf(stderr, "Usage: %s [-v] [-f] [-n] [-g] <inputfile> <outputfile>\n", self);
+ fprintf(stderr, "-v: verbose\n");
+ fprintf(stderr, "-f: force variable name table rebuild\n");
+ fprintf(stderr, "-n: do not rebuild variable name table, even if it's invalid\n");
+ fprintf(stderr, "-g: remove trailing garbage, if present\n");
+ fprintf(stderr, "Use - as a filename to read from stdin and/or write to stdout\n");
+}
+
+void invalid_args(const char *arg) {
+ fprintf(stderr, "%s: Invalid argument '%s'\n\n", self, arg);
+ print_help();
+ exit(1);
+}
+
+FILE *open_file(const char *name, const char *mode) {
+ FILE *fp;
+ if(!(fp = fopen(name, mode))) {
+ perror(name);
+ exit(1);
+ }
+ return fp;
+}
+
+void open_input(const char *name) {
+ if(!name) {
+ if(freopen(NULL, "rb", stdin)) {
+ input_file = stdin;
+ return;
+ } else {
+ perror("stdin");
+ exit(1);
+ }
+ }
+
+ input_file = open_file(name, "rb");
+}
+
+void open_output(const char *name) {
+ if(!name) {
+ if(freopen(NULL, "wb", stdout)) {
+ output_file = stdout;
+ return;
+ } else {
+ perror("stdout");
+ exit(1);
+ }
+ }
+
+ output_file = open_file(name, "wb");
+}
+
+void parse_args(int argc, char **argv) {
+ self = *argv;
+ if(argc < 2) {
+ print_help();
+ exit(0);
+ }
+ while(++argv, --argc) {
+ if((*argv)[0] == '-') {
+ switch((*argv)[1]) {
+ case 'v': verbose++; break;
+ case 'f': forcevars++; break;
+ case 'n': keepvars++; break;
+ case 'g': keepgarbage = 0; break;
+ case 0:
+ if(!input_file)
+ open_input(NULL);
+ else if(!output_file)
+ open_output(NULL);
+ else
+ invalid_args(*argv);
+ break;
+ default: invalid_args(*argv); break;
+ }
+ } else {
+ if(!input_file)
+ open_input(*argv);
+ else if(!output_file)
+ open_output(*argv);
+ else
+ invalid_args(*argv);
+ }
+ }
+
+ if(!input_file) die("no input file given (use - for stdin)");
+ if(!output_file) die("no output file given (use - for stdout)");
+ if(keepvars && forcevars) die("-f and -n are mutually exclusive");
+}
+
+int main(int argc, char **argv) {
+ parse_args(argc, argv);
+
+ filelen = readfile();
+
+ lomem = getword(0);
+ vntp = getword(2);
+ vntd = getword(4);
+ vvtp = getword(6);
+ stmtab = getword(8);
+ stmcur = getword(10);
+ starp = getword(12);
+ codestart = stmtab - STM_OFFSET - (vntp - 256);
+ vnstart = vntp - 256 + 14;
+ vvstart = vvtp - 256 + 14;
+
+ if(lomem) die("This doesn't look like an Atari BASIC program (no $0000 signature)");
+
+ fprintf(stderr, "LOMEM %04x\n", lomem);
+ fprintf(stderr, "VNTP %04x\n", vntp);
+ fprintf(stderr, "VNTD %04x\n", vntd);
+ fprintf(stderr, "VVTP %04x\n", vvtp);
+ fprintf(stderr, "STMTAB %04x, codestart %04x\n", stmtab, codestart);
+ fprintf(stderr, "STMCUR %04x\n", stmcur);
+ fprintf(stderr, "STARP %04x\n", starp);
+ fprintf(stderr, "vvstart %04x\n", vvstart);
+
+ /*
+ fprintf(stderr, "data at STMTAB (we hope):\n");
+ for(int i=codestart; i<filelen; i++) {
+ fprintf(stderr, "%02x ", data[i]);
+ }
+ fprintf(stderr, "\n");
+ */
+
+ if(!keepvars) {
+ if(fixvars())
+ fprintf(stderr, "Variable names replaced\n");
+ else
+ fprintf(stderr, "Variable names were already OK\n");
+ }
+
+ if(fixcode())
+ fprintf(stderr, "Fixed invalid offset in code\n");
+ else
+ fprintf(stderr, "No invalid offsets (maybe wasn't protected?)\n");
+
+ fwrite(data, filelen, 1, output_file);
+ return 0;
+}
diff --git a/unprotbas.rst b/unprotbas.rst
new file mode 100644
index 0000000..735681e
--- /dev/null
+++ b/unprotbas.rst
@@ -0,0 +1,156 @@
+=========
+unprotbas
+=========
+
+---------------------------------------------------
+Unprotect LIST-protected Atari 8-bit BASIC programs
+---------------------------------------------------
+
+.. include:: manhdr.rst
+
+SYNOPSIS
+========
+
+unprotbas [**-v**] [**-f**] [**-n**] [**-g**] **input-file** **output-file**
+
+DESCRIPTION
+===========
+
+**unprotbas** modifies LIST-protected Atari 8-bit BASIC programs,
+creating a new non-protected copy. See **DETAILS**, below, to
+understand how the protection and unprotection works.
+
+**input-file** must be a tokenized Atari BASIC program. Use *-* to
+read from standard input.
+
+**output-file** will be the unprotected tokenized BASIC program. If it
+already exists, it will be overwritten. Use *-* to write to standard
+output, but **[TODO]** **unprotbas** will refuse to write to standard
+output if it's a terminal (since tokenized BASIC is binary data and
+may confuse the terminal).
+
+OPTIONS
+=======
+
+**-v**
+ Verbose operation.
+
+**-f**
+ Force the variable name table to be rebuilt, even if it looks OK.
+
+**-n**
+ Don't rebuild the variable table (only fix the line pointers, if
+ needed).
+
+**-g**
+ Remove any "garbage" data from the end of the file. By default,
+ it's left as-is, in case it's actually data used by the program.
+
+EXIT STATUS
+===========
+
+Exit status is zero for success, non-zero for failure.
+
+DETAILS
+=======
+
+In the Atari BASIC world, it's possible to create a SAVEd (tokenized)
+program that can be RUN from disk (**RUN "D:FILE.BAS"**) but if
+it's LOADed, it will either crash the BASIC interpreter, or LIST
+as gibberish. This is known as LIST-protection. Such programs are
+generally released to the world in protected form; the author
+privately keeps an unprotected copy so he can modify it. In
+later days, collections such as the Holmes Archive contain many
+LIST-protected programs, for which the unprotected version was never
+released.
+
+One example of LIST-protection, taken from *Mapping the Atari* (the
+**STMCUR** entry in the memory map) looks like::
+
+ 32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI
+ 32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW
+
+To use, add the 2 lines of code to your program, then execute them
+with **GOTO 32000** in immediate mode.
+
+This illustrates both types of protection, which can be (and usually
+are) applied to the same program:
+
+Variable name table scrambling
+ BASIC has specific rules on what are and aren't considered legal
+ variable names, which are enforced by the tokenization process,
+ at program entry time. However, it doesn't use the variable names
+ at runtime, when the tokenized file is interpreted.
+
+ Replacing the variable names with binary gibberish will render the
+ program LIST-proof, either replacing every variable name with the
+ same control character, or causing LIST to display a long string of
+ binary garbage for each variable name... but the program will still
+ RUN correctly. Note that the original variable names are *gone*,
+ and cannot be recovered.
+
+ Line 32000 in the example above does this job, replacing every
+ variable name with the EOL character (155).
+
+ **unprotbas** detects a scrambled variable name table, and builds
+ a new one that's valid. However, since there are no real variable
+ names in the program, the recovery process just invents new ones,
+ named A through Z, A1 through A9, B1 through B9, etc, etc. It'll
+ require human intelligence to figure out what each variable is for,
+ since the names are meaningless.
+
+ The **output-file** may be larger than the **input-file** was, since
+ some types of variable-name scrambling shrink the variable name
+ table to the minimum size (one byte per name); the rebuilt table
+ will be larger.
+
+Bad next-line pointer
+ Generally, this is done with line number 32768. Yes, this line
+ number is outside the range BASIC accepts... but BASIC uses it
+ internally for immediate-mode commands. And when SAVE or CSAVE are
+ executed, this line gets saved, too.
+
+ Every line of tokenized BASIC contains a line length byte, which
+ BASIC uses as a pointer to the next line of code. Before printing
+ the READY prompt, BASIC iterates over every line of code in the
+ program, using the next-line pointers, in order to delete any
+ existing line 32768 (the previous immediate mode command). If any
+ line's pointer is set to zero, that means it points to itself.
+
+ When BASIC tries to traverse a line of code that points to itself as
+ "next" line, it will get stuck in an infinite loop. This not only
+ prevents LIST, it actually prevents any immediate mode command:
+ after LOADing such a file, *nothing* will work (even pressing RESET
+ won't get you out of it). The only way to use such a program is to
+ use the RUN command with a filename, and if the program ever exits
+ (due to END, STOP, an error, or the Break key), BASIC will get stuck
+ again.
+
+ This doesn't *have* to be done with line 32768. Any line of code
+ that doesn't have to be traversed at runtime would work (in other
+ words, a regular line whose line number is higher than any code that
+ ever gets executed, usually the last line in the file).
+
+ Line 32100 in the example above does this job, taking advantage of
+ the STMCUR pointer used by BASIC, which holds the address of the
+ line of tokenized code currently being executed.
+
+ **unprotbas** fixes this simply by calculating what the pointer
+ should be (based on the tokens in the line) and changing it. No
+ information is lost by doing this.
+
+One more thing **unprotbas** can do is remove extra data from the end
+of the file. It's possible for BASIC files to contain extra data that
+occurs after the end of the program. Some programs use this as a way
+to load arbitrary binary data into memory along with the program; for
+other programs, the extra data is truly garbage (e.g. an EOF character
+if the file came from a CP/M system, or padding to a block size if a
+dumb implementation of XMODEM was used to transfer the file).
+
+Normally, such "garbage" doesn't hurt anything. BASIC ignores it. Or
+it normally does... if you suspect it's causing a problem, you can
+remove it with the **-g** option. If removing the "garbage" causes the
+program to fail to run, it wasn't garbage! **unprotbas** doesn't
+remove extra data by default, to be on the safe side.
+
+.. include:: manftr.rst