aboutsummaryrefslogtreecommitdiff
path: root/src/col64/cruft/col64.dasm
blob: aca1259a6526a00faf64e32a24371ee664a0a588 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194

; ----------------------------------------------------------------------------
; 64x32 software text driver
; Uses 4x5 font in 5x6 character cells
; Based on disassembled COL80 driver

; This driver is missing quite a few things a full E: driver should have:

; - Does not work with BASIC or any other cart! Though if START_ADDRESS_1
;   is changed to $7Cxx, a BASIC-able version could be assembled

; - No support for cursor controls, insert/delete, or even clear screen
;   (instead the control-codes are printed as normal characters, if they're
;   above 31 decimal). This is a feature: our intended application is an
;   IRC client, where ASCII codes 125 and 126 represent } and ~ chars,
;   not clear-screen and backspace.

; - Backspace key is supported during get-byte, but printing a backspace
;   with put-byte prints a tilde instead. Again, a feature.

; - No support for low ASCII at all: any attempt to input or print a character
;   in the range 0-31 decimal is just ignored (this includes the arrow keys).
;   This is done to keep the font size at 96 characters, so it'll fit in a
;   page. Also IRC doesn't use low-ASCII.

; - Does not attempt to survive a warmstart

; - Will fail disastrously if an application sets COLCRS/ROWCRS to any
;   out-of-range values

; - Only displays the cursor during get-byte operations

; On the other hand, this driver is tiny (font is 240 bytes, code 718, not
; counting 100 bytes of throwaway init code in the tape buffer), and plays
; nice with cc65's stdio support (though NOT conio, don't even go there)

; Theory of operation:

; This is an Atari OS-compliant device driver. On load, the init routine
; replaces the E: entry in HATABS with a pointer to our vector table,
; col64_vector_tab.

; We implement the CIO commands OPEN, CLOSE, GETBYTE, and PUTBYTE. The
; GET-STATUS and SPECIAL commands are not implemented, and the CLOSE
; command simply returns success (it doesn't need to do anything else).

; When a channel is OPENed to our new E: device, the col64_open routine
; calls part of the OS's S: handler to switch to GRAPHICS 8 (320x192 mono)
; mode.

; When the CIO PUT-BYTES or PUT-RECORD calls are made on our channel, CIO
; calls our PUTBYTE routine (col64_putbyte) as needed to write (print)
; characters to the screen. col64_putbyte keeps track of the current
; cursor position (using COLCRS and ROWCRS), and either renders a glyph
; at the current location, or (when the EOL character $9B is printed)
; moves the cursor to the beginning of the next line, scrolling the
; display RAM if necessary.

; When the CIO GET-BYTES or GET-RECORD calls are made on our channel, CIO
; calls our GETBYTE routine (col64_getbyte). This routine maintains its
; own buffer of user-input characters. When first called (or any call when
; the buffer is empty), col64_getbyte reads a complete line of input using
; the OS's K: handler, echoing typed characters to the screen using our
; own col64_putbyte. Only the backspace key is supported for editing. Once
; the user presses Return, the first byte in the buffer is returned to
; the caller. On subsequent calls, characters are returned from the buffer
; until the buffer is empty. At this point, the next call to col64_getbyte
; will result in a new line of input being read.

; col64_putbyte calls render_glyph to draw a character in display RAM.
; The font consists of 96 characters (low ASCII not supported), each 4 pixels
; wide and 5 tall [*]. These are rendered into 5x6 character cells, with the
; extra pixels being left blank to provide padding between characters.
; Since 5 pixel wide cells don't line up evenly on display RAM byte
; boundaries, render_glyph has to shift each 4-pixel line of glyph
; data the appropriate number of times, and combine the result with the
; rest of each screen byte (that portion that's outside of the 5-pixel
; cell we're currently writing to). Surprisingly, this is accomplished
; slightly faster than the standard OS E: driver prints a single character.
; (unfortunately, scrolling the screen is *much* slower than the OS E:
; driver's scrolling...)

; [*] ...except the lowercase q j p q y, which have true descenders. The
; glyphs for these are still 4x5, but the 6th row of pixels for all 5
; get stored together in the last glyph in the set (ASCII code $7F,
; which should be considered an unprintable character).

 processor 6502 ; DASM-specific

; ----------------------------------------------------------------------------
 include equates.inc

; "backdoor" vectors into the S: and K: ROM drivers. These are addres-minus-1,
; called by PHA/PHA/RTS.
s_dev_open_lo   = $E410 ; (not named in OS sources)
s_dev_open_hi   = $E411 ; ""
k_dev_get_lo    = $E424 ; ""
k_dev_get_hi    = $E425 ; ""

; ----------------------------------------------------------------------------
; Constants

LAST_ROW = $1F ; last valid cursor row on screen, 31 decimal (we count from 0)
EOL      = $9B ; Atari ASCII end-of-line (aka the Return key)
INVERTED_MASK = $F8 ; stored in inverse_mask when rendering in inverse video

; ----------------------------------------------------------------------------
; Defaults

DEFAULT_TEXT_COLOR = $00
DEFAULT_BG_COLOR   = $C8
DEFAULT_RMARGN     = $3F

; ----------------------------------------------------------------------------
; ZP storage
; Reuse some of the E:/S: devices' storage
; These all get recalculated on col64_putbyte entry,
; so it's probably OK to mix with GR.8 PLOT/DRAWTO stuff

 seg.u "data"
 org $A0 ; (used to be $90 aka OLDROW)
mask_lo         ds 1 ; Holds the 16-bit mask, used by render_glyph
mask_hi         ds 1
screen_ptr_lo   ds 1 ; Pointer to display RAM: render_glyph and scroll_screen
screen_ptr_hi   ds 1 ; both use this.
font_index      ds 1 ; index into font: TMPCHR * 5
shift_amount    ds 1 ; How many times to right-shift during render_glyph

 ;org $63 ; aka LOGCOL
glyph_data_lo     ds 1 ; render_glyph stores (possibly shifted) glyph data
glyph_data_hi     ds 1 ; here. Also a 2nd display RAM pointer in scroll_screen
lo_nybble_flag    ds 1 ; non-zero if glyph data in top 4 bits of font

 ;org $68 ; aka SAVADR
inverse_mask      ds 1 ; 0 for normal video, INVERTED_MASK for inverse
line_buffer_index ds 1 ; used by col64_getbyte
descender         ds 1 ; offset-1 of true descender byte, into font_data

; We also use several other ZP locations, normally used by the OS E: driver:
; COLCRS ROWCRS TMPCHR BUFCNT RMARGN SAVMSC are used for the same purpose as
; the OS uses them for.

; ----------------------------------------------------------------------------
; Non-ZP storage (cassette buffer for now)

 org CASBUF
line_buffer       ds $80

 seg "code"
; ----------------------------------------------------------------------------
; XEX segment header

;START_ADDRESS_1 = $9C78
START_ADDRESS_1 = $9C00 ; should end at $A036; subtract 2K to use with BASIC
 org START_ADDRESS_1-6

 word $FFFF
 word START_ADDRESS_1
 word END_ADDR_1

; ----------------------------------------------------------------------------
; Font data should be aligned on a page boundary
; (for speed; still works unaligned). Each 5 bytes stores 2 glyphs
; side-by-side. When rendering into 5x6 cells, the bottom line and
; the left column are set to all 0 (or all 1 for inverse video) for padding,
; so the glyphs can take up the entire 4x5 space.
; Only character codes 32 through 127 are supported; this is 96
; characters. At 2 glyphs per 5 bytes, that's 96/2*5=240 bytes,
; which means the whole font fits into one page and can be
; accessed with X or Y indexed mode rather than (indirect),Y.

font_data:
 ;include font4x5.inc
 include col64_ext.inc

DESCENDERS = $EB ; g j p q y descender bytes, offset from font_data

; ----------------------------------------------------------------------------
; Mask tables, used by setup_mask. Indexed by (COLCRS % 8), since they
; would repeat every 8 bytes anyway. Each mask is 16 bits, and contains
; a 1 for each pixel in screen RAM that we *don't* want to change when
; writing pixels (because they're not part of the current character cell),
; but see below...

mask_lo_tab:
 byte $07, $F8, $C1, $FE, $F0, $83, $FC, $E0

; In mask_hi_tab, $00 is never a valid mask, while $FF means "don't touch any
; bits in 2nd byte". As a minor optimization, $00 appears in the table in
; place of $FF. This allows us to just test the Z flag after loading the mask
; hi byte, instead of needing to compare against $FF.
; (Actually, on further thought, the $FF's could have been left in, and the
; test could have been made on the N flag: any hi mask with bit 7 set won't have
; any other bits clear. *Shrug*)

mask_hi_tab:
 byte $00, $3F, $00, $0F, $7F, $00, $1F, $00

; ----------------------------------------------------------------------------
; Indexed by COLCRS%8, how many bits to shift the font data right
; The mask_*_tab stuff above could be calculated at runtime from this,
; the tables are only there for speed.

shift_amount_tab:
 byte $00, $05, $02, $07, $04, $01, $06, $03

; ----------------------------------------------------------------------------
; Line address tables, used by setup_screen_ptr. Indexed by ROWCRS. Used by
; setup_screen_ptr

line_addr_tab_lo:
 byte $00, $f0, $e0, $d0, $c0, $b0, $a0, $90
 byte $80, $70, $60, $50, $40, $30, $20, $10
 byte $00, $f0, $e0, $d0, $c0, $b0, $a0, $90
 byte $80, $70, $60, $50, $40, $30, $20, $10

line_addr_tab_hi:
 byte $00, $00, $01, $02, $03, $04, $05, $06
 byte $07, $08, $09, $0a, $0b, $0c, $0d, $0e
 byte $0f, $0f, $10, $11, $12, $13, $14, $15
 byte $16, $17, $18, $19, $1a, $1b, $1c, $1d

; ----------------------------------------------------------------------------
; Byte offset within scanline, indexed by COLCRS & $1F value
; Used by setup_screen_ptr.

column_byte_tab:
 byte $00, $00, $01, $01, $02, $03, $03, $04
 byte $05, $05, $06, $06, $07, $08, $08, $09
 byte $0A, $0A, $0B, $0B, $0C, $0D, $0D, $0E
 byte $0F, $0F, $10, $10, $11, $12, $12, $13

; Used to be indexed by COLCRS, rest of the table was:
 ;byte $14, $14, $15, $15, $16, $17, $17, $18
 ;byte $19, $19, $1A, $1A, $1B, $1C, $1C, $1D
 ;byte $1E, $1E, $1F, $1F, $20, $21, $21, $22
 ;byte $23, $23, $24, $24, $25, $26, $26, $27

; ----------------------------------------------------------------------------
; Handler table (HATABS will point to this).
; See the HATABS and EDITRV entries in Mapping the Atari for details.

col64_vector_tab:
 word col64_open-1     ; OPEN vector
 word col64_close-1    ; CLOSE vector
 word col64_getbyte-1  ; GET BYTE vector
 word col64_putbyte-1  ; PUT BYTE vector
 word col64_close-1    ; GET STATUS vector
 word col64_close-1    ; SPECIAL vector
 ;jmp col64_init        ; Jump to initialization code (JMP LSB/MSB)
                       ; (Note: the OS really only ever calls the JMP init
                       ; routines for handlers in ROM, this could be empty)

; ----------------------------------------------------------------------------
; Assembly version of GRAPHICS 8+16 command.

init_graphics_8:
 lda     #$08 ; graphics mode 8
 sta     ICAX2Z
 lda     #$0C ; R/W access
 sta     ICAX1Z
 jsr     open_s_dev

 ; Set default colors
 lda     #DEFAULT_BG_COLOR
 sta     COLOR2
 sta     COLOR4
 lda     #DEFAULT_TEXT_COLOR
 sta     COLOR1

 ; Protect ourselves from the OS
 lda     #<START_ADDRESS_1
 sta     MEMTOP
 lda     #>START_ADDRESS_1
 sta     MEMTOP+1
 rts

; ----------------------------------------------------------------------------
; Call the OPEN vector for the S: device, using the ROM vector table
; at $E410. The table stores address-minus-one of each routine, which is
; meant to actually be called via the RTS instruction (standard 6502
; technique, but confusing the first time you encounter it)

open_s_dev:
 lda     s_dev_open_hi
 pha
 lda     s_dev_open_lo
 pha
 rts

; ----------------------------------------------------------------------------
; Callback for the internal get-one-byte, used by the OS to implement the
; CIO GET RECORD and GET BYTES commands. This routine takes no arguments,
; and returns the read byte in the accumulator.

; Internally, COL64 maintains a line buffer. Each time col64_getbyte is
; called, it returns the next character in the buffer. If the buffer's
; empty (or if the last call returned the last character), a new line
; of input is read from the user (and the first character is returned).
; This is not exactly how the OS E: device works: it reads keystrokes but
; doesn't buffer them, until Return is hit; then it reads from screen
; memory, at the current cursor line (and the line(s) before/after,
; depending on its map of logical => physical lines). We can't do that
; because we don't store character codes in screen memory... although
; we *could* support insert/delete character, delete-line, and left/right
; arrows. We don't, to save driver code size...

; This code was borrowed from COL80, then big chunks of it were diked out
; and/or optimized.

col64_getbyte:
        lda     BUFCNT
        beq     get_line ; See if there are bytes left in the buffer

get_next_byte: ; Yes: return next byte from the buffer
        ldx     line_buffer_index
        lda     line_buffer,x
        dec     BUFCNT
        inc     line_buffer_index
        jmp     return_success

; Buffer is empty.
; Get a line of input from the user, terminated by the Return key.
get_line:
        lda     #$00
        sta     BUFCNT
        sta     line_buffer_index

show_cursor:
        jsr print_inv_space
        jsr     get_keystroke

; code meant to deal with BREAK key, left out 'cause FujiChat disables BREAK
;        cpy     #$01
;        beq     keystroke_ok
;        ldy #0
;        sty     line_buffer_index
;        sty     BUFCNT

keystroke_ok:
        cmp #$20
		  bcc show_cursor ; ignore low ASCII
        cmp     #EOL
        beq     return_key_hit

check_backs_key:
        cmp     #$7E
        beq      backs_key_hit

;check_clear_key: ; don't bother, don't need
                  ; (if we did want this, it should check delete-line instead!)
        ;;cmp     #$7D
        ;;beq     clear_key_hit

normal_key_hit:
        ldx     BUFCNT
        bmi show_cursor

buffer_character:
        sta     line_buffer,x
        jsr     col64_putbyte
        inc     BUFCNT
        bne     show_cursor

return_key_hit:
        jsr     print_space
        lda     #EOL
        ldx     BUFCNT
        sta     line_buffer,x
        inc     BUFCNT
        jsr     col64_putbyte
        bne     get_next_byte ; col64_putbyte always clears Z flag!

;;clear_key_hit:
;        jsr     clear_screen ; if we implemented it...
;;        lda     #$00
;;        sta     line_buffer_index
;;        sta     BUFCNT
;;        beq     get_line

backs_key_hit:
        jsr     print_space
        lda     BUFCNT
        beq     backs_key_done
        dec     COLCRS
        lda     COLCRS
        clc
        adc     #$01
        cmp     LMARGN
        bne     backs_same_line
        lda     RMARGN
        sta     COLCRS
        dec     ROWCRS

backs_same_line:
        dec     BUFCNT

backs_key_done:
        jmp     show_cursor

; ----------------------------------------------------------------------------
; Print a space or inverse space character at the current cursor position.
; Does not update the cursor position.

print_inv_space:
        lda #INVERTED_MASK
        byte $2C

print_space:
        lda     #$00
        sta     inverse_mask
        lda     #$00
        sta     TMPCHR
        jmp     render_glyph

; ----------------------------------------------------------------------------
; Get a keystroke (blocking). Just calls the OS K: get-one-byte routine
; (call by pushing address-minus-one then doing an RTS)

get_keystroke:
        lda     k_dev_get_hi
        pha
        lda     k_dev_get_lo
        pha
        ;rts ; fall through to return_success


; ----------------------------------------------------------------------------
; Unimplemented CIO callbacks here. Also various other routines jump here
; to return success to the caller.

col64_close:
return_success:
 ldy #$01
 rts

; ----------------------------------------------------------------------------
; CIO OPEN command callback

col64_open:
 jsr     init_graphics_8
 lda     #$00
 sta     ROWCRS
 sta     COLCRS
 sta     BUFCNT
 sta     LMARGN
 lda     #DEFAULT_RMARGN
 sta     RMARGN
 rts

; ----------------------------------------------------------------------------
; CIO PUT BYTE command callback
; The byte to put is passed to us in the accumulator.

col64_putbyte:
 ; EOL (decimal 155)?
 cmp     #EOL
;;;        bne     check_clear
 bne     regular_char
 lda     RMARGN
 sta     COLCRS
 jmp     skip_write

;;;check_clear:
;;; ; save memory by not including clear_screen
;;; ; (also, this lets us print the } character)
;;;        ; Clear (decimal 125)?
;;;        cmp     #$7D
;;;        bne     regular_char
;;;        jmp     clear_screen
;;; .endif
;;;

; See if this is an inverse video char (code >= 128)
regular_char:
 tax
 bpl not_inverse
 lda #INVERTED_MASK
 .byte $2C ; aka BIT abs, skip the LDA
not_inverse:
 lda #$00
 sta inverse_mask

 txa
 and #$7F
 sec
 sbc #$20 ; subtract $20 because we only handle codes $20 to $7F
 bcs not_low_ascii
 jmp return_success ; ignore any chr codes $00-$1F or $80-$9F

not_low_ascii:
 sta TMPCHR

 lda DINDEX ; OS stores current graphics mode here
 cmp #$08
 beq graphics_ok
 ; If we're not in GRAPHICS 8 mode, reinitialize ourselves
 jsr col64_open

graphics_ok:
 jsr render_glyph

skip_write:
 ; Move the cursor 1 space to the right. This will
 ; advance us to the next line if we're at the margin,
 ; and scroll the screen if needed
 jsr     advance_cursor

; Could implement SSFLAG logic here
;check_ssflag:
 ; The OS keyboard interrupt handler will toggle SSFLAG (start/stop fla
 ; any time the user presses ctrl-1
 ;lda     SSFLAG
 ;bne     check_ssflag
 jmp     return_success

; ----------------------------------------------------------------------------
; Call the routines that actually print the character.
; render_glyph prints the character in TMPCHR at the current
; COLCRS and ROWCRS, and does NOT advance the cursor.
; TMPCHR should already have bit 7 stripped; render_glyph will
; use inverse_mask, so the caller should have set that up as well.

; Note: The 4 subroutines were only ever called from render_glyph,
; so to save a tiny bit of time and space, they're now inline code.

render_glyph:
 ;jsr     setup_mask
 ;jsr     setup_screen_ptr
 ;jsr     setup_font_index
 ;jmp     write_font_data

; ----------------------------------------------------------------------------
; mask is used to avoid overwriting pixels outside the character cell
; we're currently writing. Since 5 pixel wide cells don't align on byte
; boundaries in screen RAM, we have to read/modify/write 2 bytes, and
; the mask also has to be 2 bytes wide.
; Also we set up shift_amount here.

setup_mask:
 lda COLCRS
 and #$07
 tax
 lda mask_lo_tab,x
 sta mask_lo
 lda mask_hi_tab,x
 sta mask_hi
 lda shift_amount_tab,x
 sta shift_amount
 ;rts


; ----------------------------------------------------------------------------
; Make (screen_ptr_lo) point to the first byte of screen RAM on the top line
; of the current char cell. Assumes COLCRS/ROWCRS are never out of range!
setup_screen_ptr:
 ; first the row... table lookup quicker than mult by 240
 ldx ROWCRS                   ; +3 = 3
 clc                          ; +2 = 5
 lda SAVMSC                   ; +3 = 8
 adc line_addr_tab_lo,x       ; +4 = 12
 sta screen_ptr_lo            ; +3 = 15
 lda SAVMSC+1                 ; +3 = 18
 adc line_addr_tab_hi,x       ; +4 = 22
 sta screen_ptr_hi            ; +3 = 25

 ; now do the column. column_byte_tab is only a half-table, so do lookup
 ; on bits 0-4 of COLCRS, then add 20 decimal if bit 5 of COLCRS was set.
 lda COLCRS
 tay
 and #$1F
 tax
 lda column_byte_tab,x
 cpy #$20
 bcc ssp_noadd
 adc #$13 ; +1 because C set = 20 dec

ssp_noadd:
 clc
 adc screen_ptr_lo
 sta screen_ptr_lo
 lda #0
 adc screen_ptr_hi
 sta screen_ptr_hi

 ;rts

; ----------------------------------------------------------------------------
; Set up font_index to point to the font_data bitmap for the character in
; TMPCHR. Also sets lo_nybble_flag to let the caller know whether the
; bitmap is in the upper or lower 4 bits of the bytes pointed to.

; Calculation is:
; lo_nybble_flag = (TMPCHR & 1) ? $FF : $00;
; font_index = (TMPCHR >> 1) * 5;

setup_font_index:
        lda     #$00
        sta     lo_nybble_flag
        lda     TMPCHR
        lsr     ; a = (TMPCHR >> 1)
        tay     ; y = a
        bcc     font_hi_nybble
        dec     lo_nybble_flag ; = $FF

font_hi_nybble:
        clc
        asl ; a *= 2
        asl ; a *= 2
        sta font_index
        tya
        adc font_index
        sta font_index

 ;       rts


; ----------------------------------------------------------------------------
; True descender support: we've got a hard-coded list of 5 characters
; (g j p q y) that have them. All the descender graphics are stored in the
; bottom nybble of the last glyph in the font, which means we can only have
; 5 characters that have descenders.
; TODO: tighten up this code some (faster/smaller)

 lda TMPCHR
 ldx #$FF

 cmp #'g-$20
 bne chk_j
 ldx #DESCENDERS-1
chk_j:
 cmp #'j-$20
 bne chk_p
 ldx #DESCENDERS
chk_p:
 cmp #'p-$20
 bne chk_q
 ldx #DESCENDERS+1
chk_q:
 cmp #'q-$20
 bne chk_y
 ldx #DESCENDERS+2
chk_y:
 cmp #'y-$20
 bne not_y
 ldx #DESCENDERS+3

not_y:
 stx descender

; ----------------------------------------------------------------------------
; When write_font_data is called, all this stuff must be set up:
; - font_index is the 1-byte index into font_data where the current glyph is
; - lo_nybble_flag is 0 for high nybble of glyph data, $FF for low
; - mask_lo/hi is our 16-bit pixel mask (1's are "leave original data")
; - shift_amount is # of times to shift glyph data right
; - screen_ptr_lo/hi points to the 1st byte on the top line
; - inverse_mask is 0 for normal or INVERTED_MASK for inverse
; - descender is the offset *minus one* of the byte holding the bottom 4
;   pixels in the 4x6 character. Most of the time this will be $FF,
;   AKA the the offset from font_data of the first byte of the space glyph
;   (minus one), which has 0000 in its top nybble. The rest of the time,
;   the descenders are taken from the bottom nybble of the last character
;   glyph (which would be the unprintable code $7F). The logic that does
;   the descenders is hard-coded to use the top nybble for $FF and the
;   bottom nybble for anything else.

; Loop 5 times, each time thru the loop:
; - extract 4-bit glyph data, store in glyph_data_lo
; - apply inverse_mask
; - shift right shift_amount times
; ...write data...
; - add 40 to 16-bit screen_ptr_lo/hi (to get to next line)
; ...then one more loop with font_index and lo_nybble_flag set
; to a byte in the space glyph, to handle the bottom row (which
; is always 0 for normal or all 1's for inverse)

write_font_data:
 ldx #$05 ; outer loop counter, aye

wfont_line_loop:
 lda #$00
 sta glyph_data_hi

 ldy font_index
 lda font_data,y ; note: entire font must fit in one page!

 bit lo_nybble_flag
 beq use_hi_nybble

 asl ; the pixels we want are in the low nybble of the font data byte.
 asl ; shift them up to the top nybble
 asl
 asl

use_hi_nybble: ; 4-bit glyph data now in hi nybble
 and #$F0
 eor inverse_mask
 sta glyph_data_lo

 ldy shift_amount
 beq wfont_no_shift

wfont_shift_loop: ; right-shift 16-bit glyph data shift_amount times
 lsr glyph_data_lo
 ror glyph_data_hi
 dey
 bne wfont_shift_loop

wfont_no_shift: ; Y always zero when we get here
 lda mask_lo
 and (screen_ptr_lo),y
 ora glyph_data_lo
 sta (screen_ptr_lo),y

 lda mask_hi
 beq wfont_skip_hi ; don't bother with 2nd screen byte if we don't need to
 iny
 and (screen_ptr_lo),y
 ora glyph_data_hi
 sta (screen_ptr_lo),y

wfont_skip_hi:
 dex
 bmi wfont_done
 bne wfont_not_bottom

 ;stx font_index
 ;stx lo_nybble_flag

; X always 0 here: for last line, use the descender (if there's no
; descender for this char, that's $FF. font_index will get inc'ed,
; meaning non-decender chars use the first byte of the space glyph
; as their "descender")
 lda descender
 sta font_index
 cmp #$FF
 beq wfont_not_bottom
 sta lo_nybble_flag

wfont_not_bottom:
 lda #$28 ; add 40 bytes to point to next line of screen RAM
 clc
 adc screen_ptr_lo
 sta screen_ptr_lo
 bcc wfont_noinc
 inc screen_ptr_hi

wfont_noinc:
 inc font_index
 bne wfont_line_loop ; branch always

wfont_done:
 rts

; ----------------------------------------------------------------------------
; Not the fastest scroller in the west... TODO: make faster :)

; lda_operand points to line N (minus one byte)
; sta_operand points to line N+1 (minus one byte)
; Y ranges 240 to 1 in the loop, hence the -1 byte
; CAUTION: Self-modifying code! Proceed with caution.

; This version gives us overall driver speed almost 20% faster
; than the original scroll_screen (last commented-out one below)
; Also just scrolling 250 times is slightly faster than COL80
; (1505 jiffies; setting CRITIC saves 48 jiffies)
; (makes it 17% the speed of the OS E: driver, where COL80 is 15%)
; Note that E80 is slightly over twice as fast at this as the OS!

; Note about performance where scrolling is *not* concerned:
; Printing 914 non-control characters on a freshly-cleared screen:
; OS E: - 172 jiffies (subtract 12 = 160 jiffies if cursor disabled)
; E80 -   168 jiffies (about same as OS E:)
; COL80 - 116 jiffies (yes, it's fastest!)
; CON64 from SDX 4.41 - 227 jiffies first time I tried, crashed next time!
; COL64 - 142 jiffies (still faster than OS E:)

scroll_screen:
; lda SDMCTL
; pha
; lda #0
; sta SDMCTL

; lda CRITIC
; pha
; lda #1
; sta CRITIC

 lda SAVMSC
 sec
 sbc #1
 sta lda_operand
 lda SAVMSC+1
 sbc #0
 sta lda_operand+1
 ldx #LAST_ROW

scroll_line_loop:
 ldy #$F0

 lda lda_operand+1
 sta sta_operand+1
 lda lda_operand
 sta sta_operand
 clc
 adc #$F0
 sta lda_operand
 bcc ss_noinc
 inc lda_operand+1
ss_noinc:

scroll_byte_loop:
lda_operand = *+1
 lda 0,y
sta_operand = *+1
 sta 0,y
 dey
 bne scroll_byte_loop
 dex
 bne scroll_line_loop

scroll_blank:
 lda lda_operand
 sta sta2_operand
 lda lda_operand+1
 sta sta2_operand+1
 lda #0
 ldy #$F0
sblank_loop:
sta2_operand = *+1
 sta 0,y
 dey
 bne sblank_loop

; pla
; sta SDMCTL
; pla
; sta CRITIC
 rts

;; This version is faster and fully functional
;; (overall driver speed is 10% faster than with the
;; one below)
; glyph_data_lo points to line N (minus one byte)
; screen_ptr_lo points to line N+1 (minus one byte)
; Y ranges 240 to 1 in the loop, hence the -1 byte

;;scroll_screen:
;; lda SAVMSC
;; sec
;; sbc #1
;; sta screen_ptr_lo
;; lda SAVMSC+1
;; sbc #0
;; sta screen_ptr_hi
;; ldx #LAST_ROW
;;
;;scroll_line_loop:
;; ldy #$F0
;;
;; lda screen_ptr_hi
;; sta glyph_data_hi
;; lda screen_ptr_lo
;; sta glyph_data_lo
;; clc
;; adc #$F0
;; sta screen_ptr_lo
;; bcc ss_noinc
;; inc screen_ptr_hi
;;ss_noinc:
;;
;;scroll_byte_loop:
;; lda (screen_ptr_lo),y
;; sta (glyph_data_lo),y
;; dey
;; bne scroll_byte_loop
;; dex
;; bne scroll_line_loop
;;
;;scroll_blank:
;; lda #0
;; ldy #$F0
;;sblank_loop:
;; sta (screen_ptr_lo),y
;; dey
;; bne sblank_loop
;;
;; rts

; Clock cycle timings:
; 1792080 CPU cycles per second (or is it 1789773?)
; 29868 per jiffy on NTSC (1/60 sec)
; DL = 174 bytes @ 1 cycle each
; Screen RAM = 7680 bytes @ 1 ea
; Refresh = 262 * 9 = 2358
; so ANTIC steals 10212 (34.2%) cycles, leaves 19656 (65.8%)
; Of that time, the OS steals some doing VBLANK processing.
; It may be that I can set CRITIC here and get a little of
; that back (depends on what, if anything, that does to
; the RS232 driver and/or custom keybuffer stuff in FujiChat).

; Now if I could actually do a 100% unrolled scroller, simply
; LDA blah:STA blah-1 7680 times, with NMI/VBI/ANTIC disabled,
; that would be 4+4=8 cycles times 7680 = 61440, or 3.12 frames
; (or 0.052 sec, call it 1/20 sec). Obviously can't do it that way.

; Did some tests of overall performance (not just scrolling: print
; 24 lines of 35 chars, then 24 newlines, repeat in a loop 10x)
; Turning off ANTIC DMA seems to save 25% time. Setting CRITIC only
; saves 3% or so. Disabling IRQs/NMIs entirely is probably not an
; option.

;; old (slower, but working) version:
;; scroll_screen:
;;  lda #0                ; +2 = 2
;;  sta COLCRS            ; +3 = 5
;;  sta ROWCRS            ; +3 = 8
;;  jsr setup_screen_ptr  ; +6 = 14, +54 = 68
;;
;; scroll_line_loop:
;;  lda screen_ptr_lo     ; +3 = 3
;;  sta glyph_data_lo     ; +3 = 6
;;  lda screen_ptr_hi     ; +3 = 12
;;  sta glyph_data_hi     ; +3 = 15
;;  ldx ROWCRS            ; +3 = 18
;;  cpx #LAST_ROW         ; +2 = 20
;;  beq scroll_blank      ; +2 = 22 (+1 more last time thru)
;;  inx                   ; +2 = 24
;;  stx ROWCRS            ; +3 = 27
;;  jsr setup_screen_ptr  ; +6+54 = 87
;;  ldy #0                ; +2 = 89
;;                        ; times 31 is 2759, +1 = 2760
;;
;; scroll_byte_loop:
;;  lda (screen_ptr_lo),y ; +5 = 5
;;  sta (glyph_data_lo),y ; +6 = 11
;;  iny                   ; +2 = 13
;;  cpy #$F0              ; +2 = 15
;;  bne scroll_byte_loop  ; +3 = 18 (-1 last time thru)
;;                        ; ...times 31 times 240 - 1 = 133919 (!)
;;  beq scroll_line_loop  ; +3*31-1 (92)
;;
;; scroll_blank:
;;  jsr setup_screen_ptr  ; +6+54 = 60
;;  ldy #0                ; +2 = 62
;;  tya                   ; +2 = 64
;; sblank_loop:
;;  sta (screen_ptr_lo),y ; +6 = 6
;;  iny                   ; +2 = 8
;;  cpy #$F0              ; +2 = 10
;;  bne sblank_loop       ; +3 = 13 (-1 last time)
;;                        ; times 240, minus 1 = 3119
;;
;;  rts                   ; +6
;;
;; Total routine time is 68 + 2760 + 133919 + 92 + 64 + 3119 + 6 = 140028
;; Divide by free cyc/frame: 140028/19656 = 7.12 frames, or 0.11 sec,
;; not counting VBLANK overhead. 95% of the time is of course in the
;; inner loop. Every cycle saved in the inner loop is worth about
;; 5% of the runtime! So switching from incrementing to decrementing
;; Y will save 10% of the time (from not having the CPY). Using self-
;; modifying code for the lda/sta saves 2 more cycles (10% more), but
;; adds a small bit of overhead in the outer loop.
;; The scroll_blank loop is only 2.2% of the total time. Every cycle
;; shaved off its loop is worth only 1/3 of a percent of the total
;; time (switching to dey is worth 2/3 of 1%).
;; Replace jsr setup_screen_ptr in scroll_line_loop with loading from
;; the tables directly will save approx. 35 cycles per outer loop,
;; or only about 0.8%.
;; So total savings is 20% + 0.6% + 0.8% = 21.4%, meaning real time
;; drops to approx 0.085 sec, or approx 5.5 NTSC frames (another way
;; to look at it: 11.75 scrolled lines/sec instead of the original 9)
;; Yet another way: 2 1/3 seconds to scroll the whole screen, vs
;; the original 3 1/2. Still abysmal.

;; One possibility would be to borrow a page from the ACE-80 book and
;; update LMSes within the DL. This would bloat the display list by
;; 64 bytes (not so bad). We'd basically treat screen RAM as a circular
;; buffer of 32 lines. Every scroll, update the LMS operands to point
;; to the next line-buffer, and whichever buffer is the last visible
;; line needs to be blanked. Got to see how much code this will add
;; (possibly the routine that does this might not be longer than the
;; scroll_screen I've been using). ACE-80 (or anyway E80) uses way
;; more screen RAM than it looks like it needs, plus DLIs.

; ----------------------------------------------------------------------------
; Move the cursor one space to the right (to the next line if at the margin,
; and scroll screen if on the last row)

advance_cursor:
        inc     COLCRS
        lda     RMARGN
        cmp     COLCRS
        bcs     same_line
        lda     LMARGN
        sta     COLCRS
        lda     ROWCRS
        cmp     #LAST_ROW
        bcc     no_scroll
        jsr     scroll_screen
        ; Move to last row after scrolling
        lda     #LAST_ROW-1
        sta     ROWCRS

no_scroll:
        inc     ROWCRS

same_line:
        rts

END_ADDR_1 = *-1


; ----------------------------------------------------------------------------
; Initialization. If we don't want the handler to survive a warmstart, we
; can load this bit into e.g. the cassette buffer (throw away after running)

START_ADDRESS_2 = CASBUF

; XEX segment header for throwaway init code
 rorg START_ADDRESS_2-4
 word START_ADDRESS_2
 word END_ADDR_2

col64_init:
        ldy     #$00

next_hatab_slot:
        lda     HATABS,y
        beq     register_x_handler
        iny
        iny
        iny
        cpy     #$20
        bcc     next_hatab_slot
        jmp     return_success

register_x_handler:
        lda     #$58
        sta     HATABS,y
        lda     #<col64_vector_tab
        iny
        sta     HATABS,y
        lda     #>col64_vector_tab
        iny
        sta     HATABS,y
        jmp     return_success

main_entry_point:
        jsr     col64_init
        lda     #$0C
        sta     ICCOM
        ldx     #$00
        jsr     CIOV
        lda     #$58
        sta     font_index
        lda     #$03
        sta     ICCOM
        lda     #font_index
        sta     ICBAL
        lda     #$00
        sta     ICBAH
        ldx     #$00
        jsr     CIOV
        ldy     #$07
        lda     #<col64_vector_tab
        sta     HATABS,y
        lda     #>col64_vector_tab
        iny
        sta     HATABS,y
no_e_handler:
        lda     #<START_ADDRESS_1
        sta     MEMTOP
        lda     #>START_ADDRESS_1
        sta     MEMTOP+1
        jmp     return_success

END_ADDR_2 = *-1

; XEX segment (run address)
 word INITAD
 word INITAD+1
 word main_entry_point

; That's all, folks...

; Rest of file is me rambling & meandering, ignore or not as you choose

; Possible improvements, some of which may apply to this driver only,
; some of which may apply to a (to be written) 40-column renderer.
; Redb3ard has designed really nice 8x8 normal, bold, italic fonts,
; including a bunch of Unicode normal characters... the 40-column
; driver would include UTF-8 support. Of course there's not enough
; RAM for an IRC client, TCP/IP stack, screen/DL memory, driver,
; UTF-8 tables, and 3 or 4 fonts... will have to wait until after
; the 1.0 FujiChat release (my roadmap for 1.0 is that everything
; needs to work on a 48K unmodded 800; 2.0 will have new features
; that may only work on an XL or even require 130XE banks, though
; it will still be as functional as 1.0, on an 800).

; True descenders and 5-bit-wide characters.

; For descenders, could simply leave the top row of pixels blank on
; the screen, and draw the 5 rows of character in the bottom 5 of
; the 6-scanline block. They'd possibly touch the character on the
; next line...

; 5-bit-wide characters should be used sparingly: only M W m w T
; and maybe Y V v # need them. The "extra" bit could go on the left,
; and just be a copy of the rightmost bit, since these characters
; are symmetrical... but these would often touch the character on
; the left. Capital letters, it's less of a big deal in normal English
; text, since a capital letter should be preceded by a space (or at
; least punctuation), and it won't touch the character on the right
; unless it's another 5-bit-wide one (few words in English are going
; to have that problem; maybe the name AMY in all caps). Unfortunately
; I can't think of a good way to encode the bits of information that
; say a character needs the extra pixel (or a descender either) other
; than a hard-coded set of compares against specific char codes, or
; else a loop over a table of them. Either way there's a loss of
; performance: most characters will fail all the compares...
; It'd be possible to do a 5x5 font, or even 5x6, but 5 bits wide
; means we can't pack the pixel data 2 rows per byte (so the
; font won't fit in a page any more, plus we have to use (zp),y
; or something to access it...)

; Multiple font support. Since it's an IRC-specific driver, could just
; have _putbyte directly interpret the IRC protocol formatting codes,
; render e.g. bold as inverse, italic as underline if I implement it,
; or else do up a separate italic font (if it's possible to make one
; look good in 4x5). _putbyte could at least strip out the codes it
; can't render (e.g. color).

; I've thought of doing underline (either a separate mode, or instead
; of inverse video), but a lot of the glyphs will look like crap with
; the underline, because there's no room for spacing: the underline
; will touch the bottom pixels of most characters (and overwrite the
; descenders if we add true descender support...)

; Obviously, making it a fully-functional E: replacement would be nice.
; I'm not doing this because I'm only wanting to use the driver for
; text-only cc65 programs (specifically FujiChat), though possibly
; I'll come back to col64 and do a proper version later.

; Also nice: user-adjustable number of rows. Most peoples' monitors can
; display something above 200 scanlines (particularly if they're in PAL
; countries). As-is, we're using 192 scanlines to get 32 rows (of 6 lines
; each). Most people could easily display 34 rows (204 scanlines), and
; not a few could display 35 (210), particularly if the blanking
; instructions at the start of the display list are user-adjustable too
; (to move the picture up or down). Memory usage for video RAM would
; grow by 240 byte per row of characters, and of course we'd need code
; to set up the custom display list (which would be longer than a
; normal one for GR.8). Probably would put a limit of 36 or 40 rows
; since the various tables are fixed-size...

; If there were RAM enough for it, or if we decide to use XE extra
; banks (or XL/XE under-OS RAM), could have _putbyte also store
; every output character in a buffer, for FujiChat to use as
; scrollback/log buffer (or any other program to use as it sees
; fit, of course).

; User-loadable font isn't a bad idea: the init code can look for a
; file D:COL64.FNT and load it into font_data if found.

; Could implement standard ASCII backspace/tab/linefeed/CR/etc codes,
; using low ASCII codes. This would be suitable for a telnet or terminal
; client (not needed for IRC)

; I really would like to do auto-relocation. Program could load itself
; wherever, and relocate itself (using a table of offsets, for absolute
; addresses within the program) to either MEMLO or MEMTOP. This would
; mean more init code only.

; It'd be nice to speed up scrolling. One dumb trick would be to use
; self-modifying code: instead of (zp),y addressing, use abs,y
; in the inner loop, and the outer loop modifies the operands in
; the inner. Would mean the code can't be burned to ROM, but who
; was planning to do that anyway? Also simply moving more than one
; chunk per outer-loop iteration would be a small speedup, and so
; would moving 256 bytes per, instead of 240.
; Since the inner loop does lda (zp),y [5+cyc] and sta (zp),y [6cyc]
; 7440 times, replacing them with lda abs,y [4+cyc] and sta abs,y
; [5cyc] should save 7440*2 = almost 15K cycles.
; which may not be noticeable... Unrolling the loop is even less of a
; gain: the outer loop only runs 32 times. Could set CRITIC during the
; move (not sure that helps much). No matter how you slice it,
; moving 8K on a 6502 is *slow*.

; One slightly cool idea would be to do a proper OPEN of channel #6
; to S:, like the BASIC GR. command does, instead of the "shortcut"
; using ZIOCB and calling the S: open routine directly. Doing this
; would allow PLOT, DRAWTO, and XIO 18 (fill).