diff options
-rw-r--r-- | Makefile | 4 | ||||
-rw-r--r-- | blob2xex.1 | 12 | ||||
-rw-r--r-- | blob2xex.rst | 10 | ||||
-rw-r--r-- | unprotbas.1 | 213 | ||||
-rw-r--r-- | unprotbas.c | 429 | ||||
-rw-r--r-- | unprotbas.rst | 156 |
6 files changed, 817 insertions, 7 deletions
@@ -16,9 +16,9 @@ CC=gcc CFLAGS=-Wall $(COPT) -ansi -D_GNU_SOURCE -DVERSION=\"$(VERSION)\" # BINS and SCRIPTS go in $BINDIR, DOCS go in $DOCDIR -BINS=a8eol xfd2atr atr2xfd blob2c cart2xex fenders xexsplit xexcat atrsize rom2cart unmac65 axe blob2xex xexamine xex1to2 +BINS=a8eol xfd2atr atr2xfd blob2c cart2xex fenders xexsplit xexcat atrsize rom2cart unmac65 axe blob2xex xexamine xex1to2 unprotbas SCRIPTS=dasm2atasm a8utf8 -MANS=a8eol.1 xfd2atr.1 atr2xfd.1 blob2c.1 cart2xex.1 fenders.1 xexsplit.1 xexcat.1 atrsize.1 rom2cart.1 unmac65.1 axe.1 dasm2atasm.1 a8utf8.1 blob2xex.1 xexamine.1 xex1to2.1 +MANS=a8eol.1 xfd2atr.1 atr2xfd.1 blob2c.1 cart2xex.1 fenders.1 xexsplit.1 xexcat.1 atrsize.1 rom2cart.1 unmac65.1 axe.1 dasm2atasm.1 a8utf8.1 blob2xex.1 xexamine.1 xex1to2.1 unprotbas.1 MAN5S=xex.5 MAN7S=atascii.7 DOCS=README equates.inc *.dasm LICENSE ksiders/atr.txt @@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. -.TH "BLOB2XEX" 1 "2024-05-16" "0.2.1" "Urchlay's Atari 8-bit Tools" +.TH "BLOB2XEX" 1 "2024-05-17" "0.2.1" "Urchlay's Atari 8-bit Tools" .SH NAME blob2xex \- Create Atari 8-bit executables from arbitrary data .\" RST source for blob2xex(1) man page. Convert with: @@ -69,6 +69,10 @@ filename, anyway? .sp Addresses, offsets, and sizes may be given in decimal or hex. Hex addresses must be prefixed with either \fB$\fP or \fB0x\fP\&. +.sp +It\(aqs impossible to create a segment that would wrap around the Atari\(aqs +64KB address space. Once address \fB$FFFF\fP is reached, no more data is +read from the input file. .SH OPTIONS .sp A space is required between an option and its argument; use e.g. \fB\-l 0x2000\fP, @@ -127,11 +131,13 @@ created. There are only a few possible warnings: .INDENT 0.0 .TP .B start/end address XXXX loads into ROM. -This means your .exe file\(aqs start/end addresses will load the +This means your .xex file\(aqs start/end addresses will load the file into ROM (or the unmapped area at \fB$C000\fP on a 400/800). Normally this means the .xex file won\(aqt load properly on the Atari, but feel free to ignore this warning if you know exactly -what you\(aqre doing. +what you\(aqre doing. Example: if your .xex file is intended to +be loaded on an 800 with an Axlon memory upgrade, mapped at +\fB$C000\fP, this warning can be ignored. .TP .B extra arguments after last input file ignored. You gave at least one option that would affect the next file, diff --git a/blob2xex.rst b/blob2xex.rst index 0921886..a35bf65 100644 --- a/blob2xex.rst +++ b/blob2xex.rst @@ -50,6 +50,10 @@ filename, anyway? Addresses, offsets, and sizes may be given in decimal or hex. Hex addresses must be prefixed with either **$** or **0x**. +It's impossible to create a segment that would wrap around the Atari's +64KB address space. Once address **$FFFF** is reached, no more data is +read from the input file. + OPTIONS ======= @@ -104,11 +108,13 @@ Messages containing *warning* are non-fatal, and the output file is created. There are only a few possible warnings: start/end address XXXX loads into ROM. - This means your .exe file's start/end addresses will load the + This means your .xex file's start/end addresses will load the file into ROM (or the unmapped area at **$C000** on a 400/800). Normally this means the .xex file won't load properly on the Atari, but feel free to ignore this warning if you know exactly - what you're doing. + what you're doing. Example: if your .xex file is intended to + be loaded on an 800 with an Axlon memory upgrade, mapped at + **$C000**, this warning can be ignored. extra arguments after last input file ignored. You gave at least one option that would affect the next file, diff --git a/unprotbas.1 b/unprotbas.1 new file mode 100644 index 0000000..92e6b66 --- /dev/null +++ b/unprotbas.1 @@ -0,0 +1,213 @@ +.\" Man page generated from reStructuredText. +. +. +.nr rst2man-indent-level 0 +. +.de1 rstReportMargin +\\$1 \\n[an-margin] +level \\n[rst2man-indent-level] +level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] +- +\\n[rst2man-indent0] +\\n[rst2man-indent1] +\\n[rst2man-indent2] +.. +.de1 INDENT +.\" .rstReportMargin pre: +. RS \\$1 +. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] +. nr rst2man-indent-level +1 +.\" .rstReportMargin post: +.. +.de UNINDENT +. RE +.\" indent \\n[an-margin] +.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] +.nr rst2man-indent-level -1 +.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] +.in \\n[rst2man-indent\\n[rst2man-indent-level]]u +.. +.TH "UNPROTBAS" 1 "2024-05-17" "0.2.1" "Urchlay's Atari 8-bit Tools" +.SH NAME +unprotbas \- Unprotect LIST-protected Atari 8-bit BASIC programs +.SH SYNOPSIS +.sp +unprotbas [\fB\-v\fP] [\fB\-f\fP] [\fB\-n\fP] [\fB\-g\fP] \fBinput\-file\fP \fBoutput\-file\fP +.SH DESCRIPTION +.sp +\fBunprotbas\fP modifies LIST\-protected Atari 8\-bit BASIC programs, +creating a new non\-protected copy. See \fBDETAILS\fP, below, to +understand how the protection and unprotection works. +.sp +\fBinput\-file\fP must be a tokenized Atari BASIC program. Use \fI\-\fP to +read from standard input. +.sp +\fBoutput\-file\fP will be the unprotected tokenized BASIC program. If it +already exists, it will be overwritten. Use \fI\-\fP to write to standard +output, but \fB[TODO]\fP \fBunprotbas\fP will refuse to write to standard +output if it\(aqs a terminal (since tokenized BASIC is binary data and +may confuse the terminal). +.SH OPTIONS +.INDENT 0.0 +.TP +.B \fB\-v\fP +Verbose operation. +.TP +.B \fB\-f\fP +Force the variable name table to be rebuilt, even if it looks OK. +.TP +.B \fB\-n\fP +Don\(aqt rebuild the variable table (only fix the line pointers, if +needed). +.TP +.B \fB\-g\fP +Remove any "garbage" data from the end of the file. By default, +it\(aqs left as\-is, in case it\(aqs actually data used by the program. +.UNINDENT +.SH EXIT STATUS +.sp +Exit status is zero for success, non\-zero for failure. +.SH DETAILS +.sp +In the Atari BASIC world, it\(aqs possible to create a SAVEd (tokenized) +program that can be RUN from disk (\fBRUN "D:FILE.BAS"\fP) but if +it\(aqs LOADed, it will either crash the BASIC interpreter, or LIST +as gibberish. This is known as LIST\-protection. Such programs are +generally released to the world in protected form; the author +privately keeps an unprotected copy so he can modify it. In +later days, collections such as the Holmes Archive contain many +LIST\-protected programs, for which the unprotected version was never +released. +.sp +One example of LIST\-protection, taken from \fIMapping the Atari\fP (the +\fBSTMCUR\fP entry in the memory map) looks like: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI +32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +To use, add the 2 lines of code to your program, then execute them +with \fBGOTO 32000\fP in immediate mode. +.sp +This illustrates both types of protection, which can be (and usually +are) applied to the same program: +.INDENT 0.0 +.TP +.B Variable name table scrambling +BASIC has specific rules on what are and aren\(aqt considered legal +variable names, which are enforced by the tokenization process, +at program entry time. However, it doesn\(aqt use the variable names +at runtime, when the tokenized file is interpreted. +.sp +Replacing the variable names with binary gibberish will render the +program LIST\-proof, either replacing every variable name with the +same control character, or causing LIST to display a long string of +binary garbage for each variable name... but the program will still +RUN correctly. Note that the original variable names are \fIgone\fP, +and cannot be recovered. +.sp +Line 32000 in the example above does this job, replacing every +variable name with the EOL character (155). +.sp +\fBunprotbas\fP detects a scrambled variable name table, and builds +a new one that\(aqs valid. However, since there are no real variable +names in the program, the recovery process just invents new ones, +named A through Z, A1 through A9, B1 through B9, etc, etc. It\(aqll +require human intelligence to figure out what each variable is for, +since the names are meaningless. +.sp +The \fBoutput\-file\fP may be larger than the \fBinput\-file\fP was, since +some types of variable\-name scrambling shrink the variable name +table to the minimum size (one byte per name); the rebuilt table +will be larger. +.TP +.B Bad next\-line pointer +Generally, this is done with line number 32768. Yes, this line +number is outside the range BASIC accepts... but BASIC uses it +internally for immediate\-mode commands. And when SAVE or CSAVE are +executed, this line gets saved, too. +.sp +Every line of tokenized BASIC contains a line length byte, which +BASIC uses as a pointer to the next line of code. Before printing +the READY prompt, BASIC iterates over every line of code in the +program, using the next\-line pointers, in order to delete any +existing line 32768 (the previous immediate mode command). If any +line\(aqs pointer is set to zero, that means it points to itself. +.sp +When BASIC tries to traverse a line of code that points to itself as +"next" line, it will get stuck in an infinite loop. This not only +prevents LIST, it actually prevents any immediate mode command: +after LOADing such a file, \fInothing\fP will work (even pressing RESET +won\(aqt get you out of it). The only way to use such a program is to +use the RUN command with a filename, and if the program ever exits +(due to END, STOP, an error, or the Break key), BASIC will get stuck +again. +.sp +This doesn\(aqt \fIhave\fP to be done with line 32768. Any line of code +that doesn\(aqt have to be traversed at runtime would work (in other +words, a regular line whose line number is higher than any code that +ever gets executed, usually the last line in the file). +.sp +Line 32100 in the example above does this job, taking advantage of +the STMCUR pointer used by BASIC, which holds the address of the +line of tokenized code currently being executed. +.sp +\fBunprotbas\fP fixes this simply by calculating what the pointer +should be (based on the tokens in the line) and changing it. No +information is lost by doing this. +.UNINDENT +.sp +One more thing \fBunprotbas\fP can do is remove extra data from the end +of the file. It\(aqs possible for BASIC files to contain extra data that +occurs after the end of the program. Some programs use this as a way +to load arbitrary binary data into memory along with the program; for +other programs, the extra data is truly garbage (e.g. an EOF character +if the file came from a CP/M system, or padding to a block size if a +dumb implementation of XMODEM was used to transfer the file). +.sp +Normally, such "garbage" doesn\(aqt hurt anything. BASIC ignores it. Or +it normally does... if you suspect it\(aqs causing a problem, you can +remove it with the \fB\-g\fP option. If removing the "garbage" causes the +program to fail to run, it wasn\(aqt garbage! \fBunprotbas\fP doesn\(aqt +remove extra data by default, to be on the safe side. +.SH COPYRIGHT +.sp +WTFPL. See \fI\%http://www.wtfpl.net/txt/copying/\fP for details. +.SH AUTHOR +.INDENT 0.0 +.IP B. 3 +Watson <\fI\%urchlay@slackware.uk\fP>; Urchlay on irc.libera.chat \fI##atari\fP\&. +.UNINDENT +.SH SEE ALSO +.sp +\fBa8eol\fP(1), +\fBa8utf8\fP(1), +\fBatr2xfd\fP(1), +\fBatrsize\fP(1), +\fBaxe\fP(1), +\fBblob2c\fP(1), +\fBblob2xex\fP(1), +\fBcart2xex\fP(1), +\fBdasm2atasm\fP(1), +\fBf2toxex\fP(1), +\fBfenders\fP(1), +\fBrom2cart\fP(1), +\fBunmac65\fP(1), +\fBxexamine\fP(1), +\fBxexcat\fP(1), +\fBxexsplit\fP(1), +\fBxfd2atr\fP(1), +\fBxex\fP(5), +\fBatascii\fP(7). +.sp +Any good Atari 8\-bit book: \fIDe Re Atari\fP, \fIThe Atari BASIC Reference +Manual\fP, the \fIOS Users\(aq Guide\fP, \fIMapping the Atari\fP, etc. +.\" Generated by docutils manpage writer. +. diff --git a/unprotbas.c b/unprotbas.c new file mode 100644 index 0000000..9c45fbb --- /dev/null +++ b/unprotbas.c @@ -0,0 +1,429 @@ +/**** TODO: + if the rebuilt variable name table ends up larger than the + scrambled one, the rest of the program needs to be moved upwards + in memory to make room for it. currently this isn't done, so + the variable *value* table gets corrupted by the last few + variable names overwriting the first few values. */ + +#include <stdio.h> +#include <unistd.h> +#include <stdlib.h> + +/* attempt to fix a "list-protected" Atari 8-bit BASIC program. + we don't fully detokenize, so this won't fix truly corrupted + files. + + the "fix" is in 2 parts: + 1. fix any invalid (0-byte) offsets after a line number. this is + what causes BASIC to lock up. + 2. if the variable names were overwritten (e.g. with EOL characters, + or whatever), we "fix" that by making up new variable names. +*/ + +#define STM_OFFSET 0xf2 + +/* entire file gets read into memory (for now) */ +unsigned char data[65536]; + +/* BASIC 14-byte header values */ +unsigned short lomem; +unsigned short vntp; +unsigned short vntd; +unsigned short vvtp; +unsigned short stmtab; +unsigned short stmcur; +unsigned short starp; + +/* positions where various parts of the file start, + derived from the header vars above. */ +unsigned short codestart; +unsigned short vnstart; +unsigned short vvstart; +int filelen; + +/* name of executable, taken from argv[0] */ +char *self; + +/* these are set by the various command-line switches */ +int keepvars = 0; +int forcevars = 0; +int keepgarbage = 1; +int verbose = 0; + +/* file handles */ +FILE *input_file = NULL; +FILE *output_file = NULL; + +void die(const char *msg) { + fprintf(stderr, "%s: %s\n", self, msg); + exit(1); +} + +/* read entire file into memory */ +int readfile(void) { + int got = fread(data, 1, 65535, input_file); + fprintf(stderr, "read %d bytes\n", got); + return got; +} + +/* get a 16-bit value from the file, in 6502 LSB/MSB order. */ +unsigned short getword(int addr) { + return data[addr] | (data[addr + 1] << 8); +} + +/* fixline() calculates & sets correct line length, by iterating + over the statement(s) within the line. the last statement's + offset will be the same as the line offset should have been, + if it weren't zeroed. when reading this code, it's helpful to + know that the lengths (line and statement) are counted from the + start of the line in memory. + + A line with only a line number and one token (such as END) would have a + line length of 6: 2 for the 16-bit line number, 1 for the length byte + itself, 1 for the statement length byte (also 6), 1 for the END token, and one + for the end-of-line token. + + A line with two statements: 10 ?:END + offset value meaning + 0 0A line number (low byte) + 1 00 line number (high byte) + 2 09 line length (or, offset to next line) [!] + 3 06 offset to next statement *from the start of the line* + 4 28 token for "?" + 5 14 token for : (end of statement) + 6 09 offset to next statement [!] + 7 15 token for END + 8 16 token for end-of-line [*] + 9 ?? (line number of next statement) + + Note the values marked with [!] are equal. + + [*] end-of-line is $16 *except* for REM and DATA, which are + terminated with $9B instead. +*/ +int fixline(int linepos) { + /* +3 here to skip the line number + line length */ + int token, done = 0, offset = data[linepos + 3]; + + while(!done) { + offset = data[linepos + offset]; + token = data[linepos + offset - 1]; + fprintf(stderr, "offset %02x token %02x\n", offset, token); + if(token != 0x14) + done++; + } + + data[linepos + 2] = offset; + return offset; +} + +/* Iterate over all the tokenized lines. If any of them have invalid + line lengths (<=5), call fixline() on them. */ +int fixcode(void) { + int result = 0; + int pos = codestart; + int offset, lineno = -1, tmpno; + + while(pos < filelen) { + tmpno = getword(pos); + if(tmpno <= lineno) { + fprintf(stderr, "Warning: line number %d at offset %04x is <= previous line number %d\n", + tmpno, pos, lineno); + } + lineno = tmpno; + + offset = data[pos + 2]; + /* fprintf(stderr, "pos %d, line #%d, offset %d\n", pos, lineno, offset); */ + if(offset < 6) { + fprintf(stderr, "Found invalid offset %d (<6) at line %d\n", offset, lineno); + offset += fixline(pos); + result++; + } + pos += offset; + + /* Atari BASIC tolerates garbage after the last tokenized line, + so we must do likewise. */ + if(lineno == 32768) break; + } + + fprintf(stderr, "End program pos %04x/%d\n", pos, pos); + + if(filelen > pos) { + fprintf(stderr, "trailing garbage at EOF, %d bytes, %s\n", + filelen - pos, (keepgarbage ? "keeping" : "removing")); + if(!keepgarbage) filelen = pos; + } + + return result; +} + +/* Fixing the variables is a bit more work than it seems like + it might be, because the last byte of the name has to match + the type (inverse video "(" for numeric array, inverse "$" for + string, inverse last character of name for scalars). To do + this right, we have to examine the variable value table to + find out the type of each variable. + + Each variable type get assigned A to Z, then A1 to A9, B1 to B9, + etc. This means there will be A, A$, and A( variables, which might + be a bit confusing, but we have to keep the generated name table as + short as possible, because we can't extend the size of the table in + the file. + + We can find the actual table size in the file by subtracting VNTP + (start of variable name table) from VNTD (end of variable name table), + and if we run out of space for the generated names, something is + seriously off... + + The maximum number of variable names is 128. If all 128 vars are in + use, the minimum table size is 230 (26 one-letter names, 102 2-letter + or letter+number or one-letter string/array names). + +*/ + +int fixvars(void) { + int vp = vnstart, vv = vvstart; + int strings = 0, arrays = 0, scalars = 0, varname = 0, varnum = 0; + int bad = 0; + + /* See if the variables even need fixing. + + This code is simpler than it should be: it checks that all + characters in the variable name table are valid, but doesn't + check that they're in valid sequences. Example: a variable name + that's just an inverse dollar sign would be considered OK). + Also multiple variables of the same type with the same name + would be OK. + + However, if all the bytes are the same value, even if it's a + valid character, that's correctly detected as invalid. + */ + + if(vntp == vntd) { + fprintf(stderr, "No variables\n"); + return 0; + } + + vp = vnstart + 1; + bad = 1; + while(vp < vvstart - 1) { + if(data[vp] != data[vnstart]) bad = 0; + vp++; + } + + vp = vnstart; + while(vp < vvstart) { + unsigned char c = data[vp]; + fprintf(stderr, "%04x/%04x: %04x\n", vp, vvstart, c); + + /* allow a null byte only at the end of the table! */ + /* if(c == 0 && vp == vvstart - 1) break; */ + /* new rule: treat a null byte as end-of-table, ignore any + junk between it and VNTP. */ + if(c == 0) break; + + vp++; + + /* inverse $ or ( is OK */ + if(c == 0xa4 || c == 0xa8) continue; + + /* numbers and letters are allowed, inverse or normal. */ + c &= 0x7f; + if(c >= 0x30 && c <= 0x39) continue; + if(c >= 0x41 && c <= 0x5a) continue; + + bad++; + break; + } + if(!forcevars && !bad) return 0; + + vp = vnstart; + while(vv < codestart) { + unsigned char sigil = 0; + /* type: scalar = 0, array = 1, string = 2 */ + unsigned char type = data[vv] >> 6; + /* fprintf(stderr, "%04x: %04x, %d\n", vv, data[vv], type); */ + + if(varnum != data[vv+1]) { + fprintf(stderr, "Warning: variable value is corrupt!\n"); + } + varnum++; + + switch(type) { + case 1: varname = arrays++; sigil = 0xa8; break; + case 2: varname = strings++; sigil = 0xa4; break; + default: varname = scalars++; break; + } + + if(varname < 26) { + data[vp] = ('A' + varname); + } else { + varname -= 26; + data[vp++] = 'A' + (varname / 9); + data[vp] = ('1' + (varname % 9)); + } + + if(sigil) { + vp++; + data[vp++] = sigil; + } else { + data[vp] |= 0x80; + vp++; + } + + vv += 8; + } + + /* there's supposed to be a null byte at the end of the table, unless + all 128 table slots are used. */ + if(varnum < 128) data[vp] = 0; + + /* fixup the VNTD pointer */ + vntd = vntp + (vp - vnstart); + data[4] = vntd & 0xff; + data[5] = vntd >> 8; + + fprintf(stderr, "%d variables, VNTD adjusted to %04x\n", varnum, vntd); + return 1; +} + +void print_help(void) { + fprintf(stderr, "Usage: %s [-v] [-f] [-n] [-g] <inputfile> <outputfile>\n", self); + fprintf(stderr, "-v: verbose\n"); + fprintf(stderr, "-f: force variable name table rebuild\n"); + fprintf(stderr, "-n: do not rebuild variable name table, even if it's invalid\n"); + fprintf(stderr, "-g: remove trailing garbage, if present\n"); + fprintf(stderr, "Use - as a filename to read from stdin and/or write to stdout\n"); +} + +void invalid_args(const char *arg) { + fprintf(stderr, "%s: Invalid argument '%s'\n\n", self, arg); + print_help(); + exit(1); +} + +FILE *open_file(const char *name, const char *mode) { + FILE *fp; + if(!(fp = fopen(name, mode))) { + perror(name); + exit(1); + } + return fp; +} + +void open_input(const char *name) { + if(!name) { + if(freopen(NULL, "rb", stdin)) { + input_file = stdin; + return; + } else { + perror("stdin"); + exit(1); + } + } + + input_file = open_file(name, "rb"); +} + +void open_output(const char *name) { + if(!name) { + if(freopen(NULL, "wb", stdout)) { + output_file = stdout; + return; + } else { + perror("stdout"); + exit(1); + } + } + + output_file = open_file(name, "wb"); +} + +void parse_args(int argc, char **argv) { + self = *argv; + if(argc < 2) { + print_help(); + exit(0); + } + while(++argv, --argc) { + if((*argv)[0] == '-') { + switch((*argv)[1]) { + case 'v': verbose++; break; + case 'f': forcevars++; break; + case 'n': keepvars++; break; + case 'g': keepgarbage = 0; break; + case 0: + if(!input_file) + open_input(NULL); + else if(!output_file) + open_output(NULL); + else + invalid_args(*argv); + break; + default: invalid_args(*argv); break; + } + } else { + if(!input_file) + open_input(*argv); + else if(!output_file) + open_output(*argv); + else + invalid_args(*argv); + } + } + + if(!input_file) die("no input file given (use - for stdin)"); + if(!output_file) die("no output file given (use - for stdout)"); + if(keepvars && forcevars) die("-f and -n are mutually exclusive"); +} + +int main(int argc, char **argv) { + parse_args(argc, argv); + + filelen = readfile(); + + lomem = getword(0); + vntp = getword(2); + vntd = getword(4); + vvtp = getword(6); + stmtab = getword(8); + stmcur = getword(10); + starp = getword(12); + codestart = stmtab - STM_OFFSET - (vntp - 256); + vnstart = vntp - 256 + 14; + vvstart = vvtp - 256 + 14; + + if(lomem) die("This doesn't look like an Atari BASIC program (no $0000 signature)"); + + fprintf(stderr, "LOMEM %04x\n", lomem); + fprintf(stderr, "VNTP %04x\n", vntp); + fprintf(stderr, "VNTD %04x\n", vntd); + fprintf(stderr, "VVTP %04x\n", vvtp); + fprintf(stderr, "STMTAB %04x, codestart %04x\n", stmtab, codestart); + fprintf(stderr, "STMCUR %04x\n", stmcur); + fprintf(stderr, "STARP %04x\n", starp); + fprintf(stderr, "vvstart %04x\n", vvstart); + + /* + fprintf(stderr, "data at STMTAB (we hope):\n"); + for(int i=codestart; i<filelen; i++) { + fprintf(stderr, "%02x ", data[i]); + } + fprintf(stderr, "\n"); + */ + + if(!keepvars) { + if(fixvars()) + fprintf(stderr, "Variable names replaced\n"); + else + fprintf(stderr, "Variable names were already OK\n"); + } + + if(fixcode()) + fprintf(stderr, "Fixed invalid offset in code\n"); + else + fprintf(stderr, "No invalid offsets (maybe wasn't protected?)\n"); + + fwrite(data, filelen, 1, output_file); + return 0; +} diff --git a/unprotbas.rst b/unprotbas.rst new file mode 100644 index 0000000..735681e --- /dev/null +++ b/unprotbas.rst @@ -0,0 +1,156 @@ +========= +unprotbas +========= + +--------------------------------------------------- +Unprotect LIST-protected Atari 8-bit BASIC programs +--------------------------------------------------- + +.. include:: manhdr.rst + +SYNOPSIS +======== + +unprotbas [**-v**] [**-f**] [**-n**] [**-g**] **input-file** **output-file** + +DESCRIPTION +=========== + +**unprotbas** modifies LIST-protected Atari 8-bit BASIC programs, +creating a new non-protected copy. See **DETAILS**, below, to +understand how the protection and unprotection works. + +**input-file** must be a tokenized Atari BASIC program. Use *-* to +read from standard input. + +**output-file** will be the unprotected tokenized BASIC program. If it +already exists, it will be overwritten. Use *-* to write to standard +output, but **[TODO]** **unprotbas** will refuse to write to standard +output if it's a terminal (since tokenized BASIC is binary data and +may confuse the terminal). + +OPTIONS +======= + +**-v** + Verbose operation. + +**-f** + Force the variable name table to be rebuilt, even if it looks OK. + +**-n** + Don't rebuild the variable table (only fix the line pointers, if + needed). + +**-g** + Remove any "garbage" data from the end of the file. By default, + it's left as-is, in case it's actually data used by the program. + +EXIT STATUS +=========== + +Exit status is zero for success, non-zero for failure. + +DETAILS +======= + +In the Atari BASIC world, it's possible to create a SAVEd (tokenized) +program that can be RUN from disk (**RUN "D:FILE.BAS"**) but if +it's LOADed, it will either crash the BASIC interpreter, or LIST +as gibberish. This is known as LIST-protection. Such programs are +generally released to the world in protected form; the author +privately keeps an unprotected copy so he can modify it. In +later days, collections such as the Holmes Archive contain many +LIST-protected programs, for which the unprotected version was never +released. + +One example of LIST-protection, taken from *Mapping the Atari* (the +**STMCUR** entry in the memory map) looks like:: + + 32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI + 32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW + +To use, add the 2 lines of code to your program, then execute them +with **GOTO 32000** in immediate mode. + +This illustrates both types of protection, which can be (and usually +are) applied to the same program: + +Variable name table scrambling + BASIC has specific rules on what are and aren't considered legal + variable names, which are enforced by the tokenization process, + at program entry time. However, it doesn't use the variable names + at runtime, when the tokenized file is interpreted. + + Replacing the variable names with binary gibberish will render the + program LIST-proof, either replacing every variable name with the + same control character, or causing LIST to display a long string of + binary garbage for each variable name... but the program will still + RUN correctly. Note that the original variable names are *gone*, + and cannot be recovered. + + Line 32000 in the example above does this job, replacing every + variable name with the EOL character (155). + + **unprotbas** detects a scrambled variable name table, and builds + a new one that's valid. However, since there are no real variable + names in the program, the recovery process just invents new ones, + named A through Z, A1 through A9, B1 through B9, etc, etc. It'll + require human intelligence to figure out what each variable is for, + since the names are meaningless. + + The **output-file** may be larger than the **input-file** was, since + some types of variable-name scrambling shrink the variable name + table to the minimum size (one byte per name); the rebuilt table + will be larger. + +Bad next-line pointer + Generally, this is done with line number 32768. Yes, this line + number is outside the range BASIC accepts... but BASIC uses it + internally for immediate-mode commands. And when SAVE or CSAVE are + executed, this line gets saved, too. + + Every line of tokenized BASIC contains a line length byte, which + BASIC uses as a pointer to the next line of code. Before printing + the READY prompt, BASIC iterates over every line of code in the + program, using the next-line pointers, in order to delete any + existing line 32768 (the previous immediate mode command). If any + line's pointer is set to zero, that means it points to itself. + + When BASIC tries to traverse a line of code that points to itself as + "next" line, it will get stuck in an infinite loop. This not only + prevents LIST, it actually prevents any immediate mode command: + after LOADing such a file, *nothing* will work (even pressing RESET + won't get you out of it). The only way to use such a program is to + use the RUN command with a filename, and if the program ever exits + (due to END, STOP, an error, or the Break key), BASIC will get stuck + again. + + This doesn't *have* to be done with line 32768. Any line of code + that doesn't have to be traversed at runtime would work (in other + words, a regular line whose line number is higher than any code that + ever gets executed, usually the last line in the file). + + Line 32100 in the example above does this job, taking advantage of + the STMCUR pointer used by BASIC, which holds the address of the + line of tokenized code currently being executed. + + **unprotbas** fixes this simply by calculating what the pointer + should be (based on the tokens in the line) and changing it. No + information is lost by doing this. + +One more thing **unprotbas** can do is remove extra data from the end +of the file. It's possible for BASIC files to contain extra data that +occurs after the end of the program. Some programs use this as a way +to load arbitrary binary data into memory along with the program; for +other programs, the extra data is truly garbage (e.g. an EOF character +if the file came from a CP/M system, or padding to a block size if a +dumb implementation of XMODEM was used to transfer the file). + +Normally, such "garbage" doesn't hurt anything. BASIC ignores it. Or +it normally does... if you suspect it's causing a problem, you can +remove it with the **-g** option. If removing the "garbage" causes the +program to fail to run, it wasn't garbage! **unprotbas** doesn't +remove extra data by default, to be on the safe side. + +.. include:: manftr.rst |