more dict entries, better comments, fix cart docs

author: B. Watson <yalhcru@gmail.com> 2016-02-26 16:26:32 -0500
committer: B. Watson <yalhcru@gmail.com> 2016-02-26 16:26:32 -0500
commit: b37ac0ede97639931bd540fe34848eb8bf52764b (patch)
tree: fe1127c9d0eb22329a7d96b06f8f884109cb9650
parent: f3b7f8c68e6fe58aad1d093b4efc4eb665ff2788 (diff)
download: taipan-b37ac0ede97639931bd540fe34848eb8bf52764b.tar.gz
3 files changed, 155 insertions, 101 deletions
diff --git a/cart.txt b/cart.txt
index 6b81727..2359c85 100644
--- a/cart.txt
+++ b/cart.txt
@@ -4,15 +4,12 @@ What's needed to get taipan onto a cart:
 joey_z is willing to manufacture carts like this:
 
 +---------------------------------------------------------------------------+
-| Type 13: XEGS 64 KB cartridge (banks 0-7)                                 |
+| Type 12: XEGS 32 KB cartridge                                             |
 +---------------------------------------------------------------------------+
 
- One of the two variants of the 64 KB XEGS cartridge, that's built on either
- a C100649 board with the W1 solder point not connected, or a C026449 board
- with pin 9 of the 74LS374 chip unconnected.
  This bank-switched cartridge occupies 16 KB of address space between $8000
- and $BFFF. The cartridge memory is divided into 8 banks, 8 KB each.
- Bank 7 (the last one) is always mapped to $A000-$BFFF. Three lowest bits of
+ and $BFFF. The cartridge memory is divided into 4 banks, 8 KB each.
+ Bank 3 (the last one) is always mapped to $A000-$BFFF. Two lowest bits of
  a byte written to $D500-$D5FF select the bank mapped to $8000-$9FFF.
  The initially selected bank is random, although it seems that 0 gets chosen
  the most often. Atari800 always selects bank 0 initially.
@@ -31,27 +28,28 @@ the 1200XL.
 
 ...so:
 
-Bank 7 has the startup code, uncompressed title screen, title code, and
-code to copy stuff from the other banks to RAM. The main portion of the
-code gets split up into bank-sized chunks... but the last bank of code
-doesn't need to be copied to RAM: I use a custom linker script to have
-cl65 org the RODATA segment there, plus a new segment called HIGHCODE,
-which is executable code that will run directly from the cartridge. This
-bank stays selected while the game is running, so it also has the font
-in it.
-
-There are currently 3 banks of code (numbers 0, 1, 2) that need to be
-copied to RAM. The bank with RODATA and HIGHCODE is bank 3. Banks 4, 5,
-6 are unused. As I optimize the code, bank 2 will contain less and less
-code, and hopefully will disappear one day (at which point, I only need
-a 32K cart instead of a 64K one).
-
-Code in bank 7 will copy all the chunks to correct place in RAM... and
+Bank 3 has the startup code, uncompressed title screen, title code,
+and code to copy stuff from the banks to RAM, and the tail-end of the
+code that needs to be copied. The main portion of the code gets split up
+into bank-sized chunks... but the last bank of code doesn't need to be
+copied to RAM: I use a custom linker script to have cl65 org the RODATA
+segment there, plus a new segment called HIGHCODE, which is executable
+code that will run directly from the cartridge. This bank stays selected
+while the game is running, so it also has the font in it.
+
+There are currently 2 full banks of code (numbers 0, 1) that need to be
+copied to RAM. The bank with RODATA and HIGHCODE is bank 2.
+
+Code in bank 3 will copy all the chunks to correct place in RAM... and
 I don't need to leave room for DOS or anything else, so the code can be
 ORGed at $0400 (romable_taimain.raw target in the Makefile does this).
 
-Currently, romable_taimain.raw is 18739 bytes. $0400 + 18739 means the
-code ends at $4d33. The BSS is right after that, and is less than a
+romable_taimain.raw is the code that's org'ed to $400, that gets copied
+to RAM. rodata.8000 is the code/data that is used directly from ROM,
+in bank 2.
+
+Currently, romable_taimain.raw is 17251 bytes. $0400 + 18739 means the
+code ends at $4763. The BSS is right after that, and is less than a
 page. The OS will place the GR.0 display list at $7c20, and the stack
 will grow down from there to $7a20 (except it never grows that much).
 
@@ -65,26 +63,20 @@ Amusingly, the Taipan cart will work on a 32K 800. Not a 16K Atari though.
 any stock 24K Ataris anyway, they'd be 8K or 16K machines with a 3rd-party
 RAM upgrade, and exceedingly rare these days).
 
-bank 7: fixed bank
+bank 3: fixed bank
 $a000-$bxxx - title screen data, dl, menu code,
               memory size checker, code to
               copy romable_taimain to RAM.
-              currently 1441 bytes free in this bank.
+$bxxx-$bfff - tail end of romable_taimain.raw (around 1400 bytes)
 $bffa-$bfff - cart trailer
 
 banks 0, 1: full banks of romable_taimain code
 $8000-$9dff - 31 pages (7936 bytes, 7.75K) of code
 $9f00-$9fff - unused, filled with $ff
 
-bank 2: last (partial) bank of romable_taimain
-$8000-$9f00 - up to 31 pages (7936 bytes, 7.75K) of code.
-              currently only 2809 bytes (11 pages) are used,
-              and the number's getting smaller all the time.
-$9f00-$9fff - unused, filled with $ff
-
-bank 3: RODATA and HIGHCODE segments.
+bank 2: RODATA and HIGHCODE segments.
         This bank stays enabled after copying is done.
-        currently all but 7 bytes are used.
+        currently all but 200-odd bytes are used.
 $8000-$9bff - up to 7K of code/data (28 pages)
 $9c00-$9fff - font (1K). The font has a nonzero byte at $9ffc,
               so the OS doesn't think there's a cartridge here
@@ -94,15 +86,12 @@ Unused areas are filled with $ff (this is the default state for both
 flash and EPROM). For the banks that map at $8000, this includes the
 cart trailer area. A non-zero byte (our $ff) at $9ffc tells the OS that a
 cart isn't inserted in the right slot, so it won't try to initialize/run
-our cart as a right cart. Only bank 7 (that maps as a left cart) has a
+our cart as a right cart. Only bank 3 (that maps as a left cart) has a
 valid cart trailer... according to cart.txt, every once in a while, bank
-7 might come up selected in the $8000 bank at power on. This shouldn't
+3 might come up selected in the $8000 bank at power on. This shouldn't
 matter: it'll be in both bank areas, and if the OS tries to init it as
 a right cart, the init/run addresses will point to the left cart area.
 
-banks 4, 5, 6 are unused (24K total). Possibly the manual goes here,
-if I write one.
-
 --
 
 Changes the game needed for a cart version: Not many.
@@ -117,20 +106,20 @@ all of the game's asm code, and a small amount of the C code, goes in
 a new segment called HIGHCODE (see cartbank3.cfg).
 
 checkmem.s isn't needed any longer... though there is a new memory checker
-(in bank 7) that says "32K required" if someone tries it on a 16K machine.
+(in bank 3) that says "32K required" if someone tries it on a 16K machine.
 
-Exiting the game (N at "Play again?" prompt) returns to the title screen,
-since there's no DOS to exit to. This might be slightly useful: you
-might decide to change the colors or disable sound for your next game.
+Exiting the game (N at "Play again?" prompt) reboots since there's no
+DOS to exit to. You end up at the title screen again.
 
 Pressing the Reset key does a coldstart (reboot) in the cart version.
 You end up at the title screen again.
 
-The title decompression is gone: it just displays the title screen DL
-and data straight from ROM. The menu help text and the tail end of the
-display list are in RAM though, so the menu can change them. Moving the
-end of the DL to RAM means one extra black (border-colored) scanline
-appears due to the DL jump instruction.
+The title decompression is gone: it just displays the title screen DL and
+data straight from ROM. The menu help text and the tail end of the display
+list are in RAM though, so the menu can change them. Moving the end of
+the DL to RAM means one extra black (border-colored) scanline appears
+due to the DL jump instruction. This is the only visible difference
+between the cart and xex builds.
 
 The font is located in ROM, on a 1K boundary, so CHBAS can point
 to it, rather than $2000 like the xex version.
@@ -138,9 +127,6 @@ to it, rather than $2000 like the xex version.
 num_buf and firm are located in the BSS rather than page 6 as they
 are in the .xex version.
 
-Since I have 3 empty banks... Why not include a manual on the cart,
-with pseudo-hypertext UI?
-
 --
 
 What would be *really* slick: figure out a way to split the code up
@@ -156,7 +142,9 @@ in asm? Probably. Do I want to? Not really.
 A 5200 version should be possible. At the moment, the .xex version is
 just under 32K (excluding the memory-checker segment). The 5200 would
 need its own conio, since cc65's conio for 5200 uses a 20x24 GR.1 screen
-and doesn't support input at all.
+and doesn't support input at all. It would also needs its own bignum
+library, since the bigfloat one uses the OS's floating point ROM (which
+doesn't exist on the 5200). Probably will do a bigint48 bignum lib.
 
 The 5200 cart window is 32K, so no bankswitching would be
 required. There's only 16K of RAM, but that'll be plenty.
diff --git a/messages.pl b/messages.pl
index 0d68f4d..1120a8b 100644
--- a/messages.pl
+++ b/messages.pl
@@ -2,7 +2,11 @@
 
 # compresses messages for taipan.c.
 # messages are listed at the end of this file after __END__ marker.
+# Run with no arguments to encode all messages, in which case the
 # output of this script should be redirected to messages.c.
+# With an argument, encoding is not done: instead, the strings
+# to be encoded are dumped to stdout, *after* dictionary
+# substitution is done.
 
 # make dictionary from textdecomp.s comments
 open my $t, "<textdecomp.s" or die $!;
@@ -35,7 +39,7 @@ while(<DATA>) {
 	s/^\w+\s+//;
 
 	my $orig = $_;
-	#warn "msg: $_\n";
+	print " input: $_\n" if @ARGV;
 	s/"//g;
 	s/\\r//g;
 	s/\\n/\n/g;
@@ -49,9 +53,12 @@ while(<DATA>) {
 		}
 	}
 
-	my $w = $_;
-	$w =~ s/\n/\\n/g;
-	#warn "got: \"$w\"\n";
+	if(@ARGV) {
+		my $w = $_;
+		$w =~ s/\n/\\n/g;
+		print "output: \"$w\"\n\n";
+		next;
+	}
 	open my $out, ">msg.out" or die $!;
 	print $out $_;
 	close $out;
@@ -69,6 +76,8 @@ while(<DATA>) {
 	print "\n};" . ($dict_used ? " // dictionary used" : "") . "\n\n";
 }
 
+exit 0 if @ARGV;
+
 print "// messages:          $msgcount\n";
 print "// total input size:  $total_in\n";
 print "// total output size: $total_out\n";
diff --git a/textdecomp.s b/textdecomp.s
index 617ba2f..0a0c684 100644
--- a/textdecomp.s
+++ b/textdecomp.s
@@ -1,25 +1,64 @@
 
-; text decompressor for taipan.
-; text is packed 6 bits per character. see textcomp.c
-; for details.
+; Text decompressor for Taipan.
+
+; extern void __fastcall__ print_msg(const char *msg);
+
+; Text is packed into one snac per character.
+
+; A snac is 6 bits, somewhere between a nybble and a byte. It could
+; also stand for "Six Numeral ASCII-like Code" :)
+
+; See textcomp.c for details of encoded format.
 
  .include "atari.inc"
  .export _print_msg
  .import _cputc
 
  srcptr = FR1
- outbyte = FR0 ; decoded 6-bit byte
+ outsnac = FR0 ; decoded snac (6-bit byte)
  bitcount = FR0+1 ; counts 8..1, current bit in inbyte
- inbyte = FR0+2
+ inbyte = FR0+2 ; current input byte
  ysave = FR0+3
- dict_escape = FR0+4
+ dict_escape = FR0+4 ; true if last character was a Z
 
  .rodata
-; one or two letter words are not worth listing here. 3 is only good
-; if it's used pretty often.
-; entry 0 is a dummy! The encoder gets confused by "Z\0". This may get fixed.
-; dictionary size cannot exceed 255 bytes.
-; the quoted stuff in comments is read by messages.pl, it needs to be exact.
+
+; The dictionary itself. Each entry is a snac-encoded string. One or two
+; letter words are not worth listing here: they encode to 2 bytes each,
+; plus the dictionary escape code is 2 bytes (snacs actually) per use. 3
+; is only good if it's used pretty often.
+
+; In messages.c, dict01 to dict26 will show up as Za thru Zz, and dict27
+; and up are ZA, ZB, etc. In theory, a dict entry could reference another
+; dict entry (the decoder can handle it), but in practice it's not real
+; useful to do.
+
+; Entry 0 is a dummy! The encoder gets confused by "Z\0". This may
+; get fixed.
+
+; There can be be up 63 entries in the dictionary (64, counting the
+; dummy entry 0), since a 6-bit snac is used as the index.
+
+; Dictionary size cannot exceed 255 bytes. Actually the last entry
+; can extend past 255 bytes, so long as it *starts* within 255 bytes
+; of dict00. Break this rule and you get a range error when you build.
+
+; The quoted stuff in comments is read by messages.pl, it needs to be
+; the exact un-encoded form of the snac string. Anything after the quotes
+; (e.g. number of occuurences) is ignored. The order here isn't important,
+; messages.pl will apply them in order by length (longest first).
+
+; To get the bytes to use for a particular message:
+; echo -n "message here" | ./textcomp 2>/dev/null|perl -ple 's/0x/\$/g; s/ /, /g'
+
+; TODO: no way to use \n in these (which affects dict33), fix.
+
+; TODO: if a message used in the game is exactly the same as a dict entry,
+; figure out a way for the game to use the dict entry in-place, instead
+; of a string consisting only of a dictionary lookup. Perl script can
+; generate an asm file that gets included here? _M_taipan = dict23, and
+; in messages.c it's an extern.
+
 dict00:
 dict01: .byte $98, $9d, $73, $54, $53, $80 ; "Li Yuen", 4 occurrences
 dict02: .byte $7c, $c1, $05, $4b, $57, $12, $3d, $42, $05, $48, $00 ; "Elder Brother", 3
@@ -52,6 +91,14 @@ dict28: .byte $d5, $70, $4e, $50, $00 ; " want"
 dict29: .byte $5c, $f4, $94, $20, $93, $85, $4d, $30, $00 ; "worthiness"
 dict30: .byte $d4, $d5, $43, $20, $00 ; " much"
 dict31: .byte $10, $91, $86, $15, $21, $4e, $0c, $50, $00 ; "difference"
+dict32: .byte $74, $f3, $50, $48, $11, $0f, $48, $00 ; "Comprador"
+dict33: .byte $f1, $3d, $6c, $15, $03, $d2, $50, $00 ; "'s Report"
+dict34: .byte $6d, $91, $78, $d5, $71, $7c, $30, $cd, $40 ; "Aye, we'll "
+dict35: .byte $08, $f0, $52, $10, $00 ; "board"
+dict36: .byte $40, $94, $81, $50, $50, $00 ; "pirate"
+dict37: .byte $d4, $e3, $c0 ; " no"
+dict38: .byte $d5, $72, $53, $20, $00 ; " wish"
+dict39: .byte $10, $50, $94, $00 ; "debt"
 
 dict_offsets:
  .byte dict00 - dict00
@@ -86,6 +133,14 @@ dict_offsets:
  .byte dict29 - dict00
  .byte dict30 - dict00
  .byte dict31 - dict00
+ .byte dict32 - dict00
+ .byte dict33 - dict00
+ .byte dict34 - dict00
+ .byte dict35 - dict00
+ .byte dict36 - dict00
+ .byte dict37 - dict00
+ .byte dict38 - dict00
+ .byte dict39 - dict00
 
 ; rough estimate of how many bytes are saved by the dictionary
 ; stuff: the dictionary + extra decoder stuff costs 221 bytes (vs.
@@ -94,13 +149,13 @@ dict_offsets:
 ; with only dict00 - dict23, we'll save around 173 bytes.
 ; actually it works out to 179 bytes, but the estimate was close.
 ; we've reached the point of diminishing returns: dict00 - dict31 only
-; saves 200 bytes.
+; saves 199 bytes.
 
  dictsize = * - dict00
  .out .sprintf("dictionary plus dict_offsets is %d bytes", dictsize)
 
  .rodata
-table: ; outbyte values 53..63
+table: ; outsnac values 53..63
  .byte ' ', '!', '%', ',', '.', '?', ':', 39, 40, 41, $9b
  tablesize = * - table
 
@@ -110,58 +165,57 @@ table: ; outbyte values 53..63
   .code
  .endif
 
-; extern void __fastcall__ print_msg(const char *msg);
 _print_msg:
  sta srcptr
  stx srcptr+1
  lda #0
  sta dict_escape
- sta outbyte
+ sta outsnac
  ldy #$ff ; since we increment it first thing...
 
- ldx #6 ; counts 6..1, current bit in outbyte
+ ldx #6 ; counts 6..1, current bit in outsnac
 @nextbyte:
  iny
  lda #8
- sta bitcount
+ sta bitcount  ; counts 8..1, current bit in inbyte
  lda (srcptr),y
  sta inbyte
 @bitloop:
- asl inbyte
- rol outbyte
+ asl inbyte    ; get next bit from inbyte...
+ rol outsnac   ; ...into outsnac
  dex
- beq @decode ; got 6 bits
- dec bitcount
- bne @bitloop
- beq @nextbyte
+ beq @decode   ; got 6 bits, decode into ascii
+ dec bitcount  ; more bits in this byte?
+ bne @bitloop  ; get rest of bits in this byte...
+ beq @nextbyte ; ...else next byte
 
 @decode:
- lda outbyte
- bne @notend
- rts ; 0 = end of message
+ lda outsnac
+ bne @notend   ; are we done?
+ rts           ; 0 = end of message
 
 @notend:
- ldx dict_escape ; was last character a Z?
+ ldx dict_escape ; was previous character a Z?
  beq @normalchar
 
- jsr dict_lookup
- jmp @noprint
+ jsr dict_lookup ; if so, do a dictionary lookup...
+ jmp @noprint    ; ...and pick back up at next byte
 
-@normalchar:
+@normalchar:     ; else it's a normal character
  cmp #27
  bcs @notlower
- adc #'a'-1 ; 1-26 are a-z
+ adc #'a'-1      ; 1-26 are a-z
  bne @printit
 
 @notlower:
  cmp #52
  bne @notdict
  inc dict_escape ; Z means next 6 bits are dictionary ID
- bne @noprint
+ bne @noprint    ; don't actually print the Z
 
 @notdict:
  bcs @notupper
- adc #38 ; 27-52 are A-Z
+ adc #38         ; 27-51 are A-Y
  bne @printit
 
 @notupper:
@@ -175,14 +229,17 @@ _print_msg:
  ldy ysave
 @noprint:
  lda #0
- sta outbyte
+ sta outsnac
  ldx #6
  dec bitcount
  beq @nextbyte
  bne @bitloop
 
 dict_lookup:
- ; dictionary lookup time. save our state on the stack
+ ; dictionary lookup time. save our state on the stack. note that
+ ; using the stack means dict entries could potentially contain
+ ; dictionary escapes. each level would eat 7 bytes of stack, so be
+ ; careful (the current dictionary doesn't do this at all)
  tya
  pha
  lda inbyte
@@ -194,17 +251,16 @@ dict_lookup:
  lda bitcount
  pha
 
- ; recursive call
- ldx outbyte
- lda dict_offsets,x
+ ldx outsnac ; get the start address of the dictionary entry into AX
+ lda dict_offsets,x ; this is why the dictionary can't be <255 bytes total
  clc
- adc #<dict00
- sta dict_escape ; temp usage
+ adc #<dict00    ; calculate low byte from base + offset
+ sta dict_escape ; temp usage, will be overwritten after _print_msg
  lda #>dict00
- adc #0
- tax
- lda dict_escape
- jsr _print_msg
+ adc #0          ; calculate hi byte
+ tax             ; hi byte in X
+ lda dict_escape ; lo byte in A
+ jsr _print_msg  ; recursive call, print the dictionary entry
 
  ; restore old state
  lda #0
@@ -219,7 +275,8 @@ dict_lookup:
  sta inbyte
  pla
  tay
- rts
+
+ rts ; print rest of original message
 
  decodersize = * - _print_msg
author	B. Watson <yalhcru@gmail.com>	2016-02-26 16:26:32 -0500
committer	B. Watson <yalhcru@gmail.com>	2016-02-26 16:26:32 -0500
commit	b37ac0ede97639931bd540fe34848eb8bf52764b (patch)
tree	fe1127c9d0eb22329a7d96b06f8f884109cb9650
parent	f3b7f8c68e6fe58aad1d093b4efc4eb665ff2788 (diff)
download	taipan-b37ac0ede97639931bd540fe34848eb8bf52764b.tar.gz