aboutsummaryrefslogtreecommitdiff
path: root/unprotbas.rst
blob: 28ccd8b062ec122e22ea3cb8925fe4b8d516889c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
=========
unprotbas
=========

---------------------------------------------------
Unprotect LIST-protected Atari 8-bit BASIC programs
---------------------------------------------------

.. include:: manhdr.rst

SYNOPSIS
========

unprotbas [**-v**] [**-f**] [**-n**] [**-g**] [**-c**] **input-file** **output-file**

DESCRIPTION
===========

**unprotbas** modifies a LIST-protected Atari 8-bit BASIC program,
creating a new non-protected copy. See **DETAILS**, below, to
understand how the protection and unprotection works.

**input-file** must be a tokenized (SAVEd) Atari BASIC program. Use
*-* to read from standard input.

**output-file** will be the unprotected tokenized BASIC program. If it
already exists, it will be overwritten. Use *-* to write to standard
output, but **unprotbas** will refuse to write to standard output if
it's a terminal (since tokenized BASIC is binary data and may confuse
the terminal).

OPTIONS
=======

Option bundling is not supported, use e.g. **-v -f**, not **-vf**.
To use filenames beginning with *-*, write them as *./-file*, or they
will be treated as options.

**-v**
  Verbose operation.

**-f**
  Force the variable name table to be rebuilt, even if it looks OK.
  This option cannot be combined with **-n**.

**-n**
  Don't rebuild the variable table (only fix the line pointers, if
  needed). This option cannot be combined with **-f**.

**-g**
  Remove any "garbage" data from the end of the file. By default,
  it's left as-is, in case it's actually data used by the program.

**-c**
  Check only. Does a dry run. Loads the program, unprotects it in
  memory, but doesn't write the result anywhere. In this mode, there
  is no **output-file**.

EXIT STATUS
===========

0
  **input-file** was protected, unprotection was successful.

1
  I/O error, or **input-file** isn't a valid BASIC program.

2
  **input-file** is already an unprotected BASIC program.

DETAILS
=======

In the Atari BASIC world, it's possible to create a SAVEd (tokenized)
program that can be RUN from disk (**RUN "D:FILE.BAS"**) but if
it's LOADed, it will either crash the BASIC interpreter, or LIST
as gibberish. This is known as LIST-protection. Such programs are
generally released to the world in protected form; the author
privately keeps an unprotected copy so he can modify it. In
later days, collections such as the Holmes Archive contain many
LIST-protected programs, for which the unprotected version was never
released.

One example of LIST-protection, taken from *Mapping the Atari* (the
**STMCUR** entry in the memory map) looks like::

  32000 FOR VARI=PEEK(130)+PEEK(131)*256 TO PEEK(132)+PEEK(133)*256:POKE VARI,155:NEXT VARI
  32100 POKE PEEK(138)+PEEK(139)*256+2,0:SAVE "D:filename":NEW

To use, add the 2 lines of code to your program, then execute them
with **GOTO 32000** in immediate mode.

This illustrates both types of protection, which can be (and usually
are) applied to the same program:

Variable name table scrambling
  BASIC has specific rules on what are and aren't considered legal
  variable names, which are enforced by the tokenization process,
  at program entry time. However, it doesn't use the variable names
  at runtime, when the tokenized file is interpreted.

  Replacing the variable names with binary gibberish will render the
  program LIST-proof, either replacing every variable name with the
  same control character, or causing LIST to display a long string of
  binary garbage for each variable name... but the program will still
  RUN correctly. Note that the original variable names are *gone*,
  and cannot be recovered.

  Line 32000 in the example above does this job, replacing every
  variable name with the EOL character (155).

  **unprotbas** detects a scrambled variable name table, and builds
  a new one that's valid. However, since there are no real variable
  names in the program, the recovery process just invents new ones,
  named A through Z, A1 through A9, B1 through B9, etc, etc. It'll
  require human intelligence to figure out what each variable is for,
  since the names are meaningless.

  The **output-file** may not be the exact size that the
  **input-file** was. Some types of variable-name scrambling shrink
  the variable name table to the minimum size (one byte per name), so
  the rebuilt table will be larger. Other types of scrambling leave
  the variable name table at its original size, but **unprotbas**
  generates only one- and two-character variable names, so the rebuilt
  table might be smaller.

Bad next-line pointer
  Every line of tokenized BASIC contains a line length byte, which
  BASIC uses as a pointer to the next line of code. Before printing
  the READY prompt, BASIC iterates over every line of code in the
  program, using the next-line pointers, in order to delete any
  existing line 32768 (the previous immediate mode command). If any
  line's pointer is set to zero, that means it points to itself.

  When BASIC tries to traverse a line of code that points to itself as
  "next" line, it will get stuck in an infinite loop. This not only
  prevents LIST, it actually prevents any immediate mode command:
  after LOADing such a file, *nothing* will work (even pressing RESET
  won't get you out of it). The only way to use such a program is to
  use the RUN command with a filename, and if the program ever exits
  (due to END, STOP, an error, or the Break key), BASIC will get stuck
  again.

  This doesn't *have* to be done with the last line in the
  program. The "poisoned" line could be followed by more lines of
  code, though they could never actually execute.

  Line 32100 in the example above does this job, taking advantage of
  the STMCUR pointer used by BASIC, which holds the address of the
  line of tokenized code currently being executed.

  **unprotbas** fixes this simply by calculating what the pointer
  should be (based on the tokens in the line) and changing it. No
  information is lost by doing this.

One more thing **unprotbas** can do is remove extra data from the end
of the file. It's possible for BASIC files to contain extra data that
occurs after the end of the program. Some programs use this as a way
to load arbitrary binary data into memory along with the program; for
other programs, the extra data is truly garbage (e.g. an EOF character
if the file came from a CP/M system, or padding to a block size if a
dumb implementation of XMODEM was used to transfer the file).

Normally, such "garbage" doesn't hurt anything. BASIC ignores it. Or
it normally does... if you suspect it's causing a problem, you can
remove it with the **-g** option. If removing the "garbage" causes the
program to fail to run, it wasn't garbage! **unprotbas** doesn't
remove extra data by default, to be on the safe side.

.. include:: manftr.rst