Dynamic screen memory for FujiNetChat
-------------------------------------

This *doesn't exist* yet, even if I speak of it in the present tense
in this document!

Goals:

- Maximize use of available memory for scrollback. This means variable
  sized screens, not fixed to a given address (though they *will* be
  fixed to a given bank in extended RAM).

- Support at least 15 screens. Maybe more. There will be *some* limit,
  anyway.

- Don't waste memory: if you only use a few screens, you shouldn't have
  a bunch of memory reserved for the screens you don't use. You'll only
  pay for what you use.

- Compatible with 48K, 64K (using RAM under OS), 130XE, and at least
  256K or 320K upgraded memory. It will be able to use as much RAM as
  you have, up to some limit (U1MB?).

- *Not* require the 130XE's separate ANTIC access mode. While this
  might be helpful, there's a large installed base of expanded 800XLs
  that don't support it. Plus, the ANTIC bit in PORTB is one of the
  bits that might get repurposed as a bank bit, for machines with
  loads of RAM.

- Able to use the extra 6.4K you get if you boot without DOS (straight
  from the FujiNet).

- Able to use 3K of the RAM under the OS in an XL/XE machine. This would
  be at $D800-$E3FF: the floating point pack and the font. If we could
  think of a use for another 1K, it's available at $CC00 (the international
  font).

- A *single* executable that works on all of the above (no special
  fnchatxe.xex for extended RAM), meaning it has to detect the
  amount of RAM and which extended banks exist.

- Adding new text to a screen won't require scrolling all the existing
  text up by moving it in memory, so it'll be *fast*.

- Config options to disable some of the memory. If you're running
  SpartaDOS, you need a way to tell FujiNetChat not to use the extra
  RAM under the OS. If you use a ramdisk that only uses some of your
  extended memory banks, you need a way to tell FujiNetChat to leave
  those banks alone.

Terms:

Bank - Hopefully you already know the concept of bankswitching. In
  this document, I number the banks 0 (for the base 64K) through...
  I suppose up to 255, since I'll use a byte to store the bank
  number. This means we might support up to 4MB of memory, if
  such a thing exists for the Atari. An unexpanded 130XE has
  5 banks, which I number 0 through 4. A Rambo 800XL has 13
  banks, numbered 0 to 12.

Chunk - a 40x23 (or smaller) piece of a screen (can be thought of as a
  "display window"). At any given time, the screen can only be
  displaying one chunk. Normally, this is the bottom-most one, where
  new text is printed as it comes in. The top-most chunk of a screen
  can be fewer than 23 lines (e.g. if there are 30 lines, you get one
  23-line chunk and one 7-line one).

End Marker - a "special" line whose pointer is set to point to itself,
  and whose data is all spaces. This line is shared by all screens, and
  actually be displayed (e.g. if the screen is less than 23 lines,
  they are displayed at the bottom, then the rest of the GR.0 lines
  are all end markers).

Screen - a scrollable (backwards and forwards) area that displays text,
  like FNChat already uses. In this scheme, each screen will have a
  pool number, a line count, a scrollback line count (0 = not scrolled
  up) and a pointer to the first (bottom-most) line. If the pointer
  points to the End Marker line, that means no lines are assigned to
  the screen yet (it was just created and hasn't been written to yet).
  Otherwise, it points to the address of the *bottom-most* line.
  A null pointer (0) would be an error, and should never exist.
  Also the screens will have a title and a status, like the current
  ones do.

Scrollback - as a noun: the part of the screen that's not normally
  visible. As a verb, the act of making that part visible. Scrolling
  will generally be done one chunk at a time, though there's no
  reason there couldn't be a "one line at a time" scrolling mode.

Screen height - the total number of lines in a screen (includes all
  its chunks). The minimum height of a screen, upon creation, is
  actually 0: it has no lines until it's written to.

Line - 42 bytes of memory that store 40 characters (one GR.0 line) of
  text, plus a 2-byte pointer to the next line (in the screen, or in
  the free line list). Lines in a screen are stored in a linked list
  (each points to the next), in reverse order of how they're displayed
  (bottom-most points to the 2nd-to-bottom, etc, and the top one in
  a screen points to the End Marker). Lines in the free list are also
  stored as a linked list, associated with the pool, not any screen.
  A single line cannot cross a 4K boundary, because ANTIC wouldn't
  be able to display it properly.

Free Line - a line that isn't being used by any screen. All the
  free lines in a pool are a linked list: initializing the pool
  sets up the pointers in all the lines. Closing a screen releases
  all its used lines into the pool's free line list.

Pool - A (possibly non-contiguous( region of memory available for lines.
  Each pool has a bank number, a count of unused lines, and a linked
  list of the unused lines in the pool. Pool 0 is in main memory,
  is always at least 16K, and can be up to 26880 bytes in size (using
  4K of under-the-OS RAM for an XL/XE, plus the space from $0700 to $2000
  if DOS is not booted). The other pools consist of entire banks, 16K
  apiece, one per bank. Each screen is created in one pool and cannot
  be moved to another pool.

Initialization:

At startup, FujiNetChat detects the amount of memory (number of
extended banks) and creates a pool for each bank. Bank 0's pool can
include extra memory beyond 16K: whatever's not in use by the client,
or by DOS (if you booted with one, even). Also pool 0 might have some
of the RAM under the OS on XL/XE (because FujiNetChat doesn't use the
two 1K ROM fonts or the 2K math pack, so we get 4K "for free"). All other
pools (1 and up) will be 16K.

At startup, the [server] and [private] screens will be created. Also
autojoin channels/queries will each get a screen created.

When creating screens, they're assigned to pools in round-robin
style. Suppose we have 5 banks of memory (0 through 4), with one pool
each. The first screen is created in bank 0. The second screen will be
created in bank 1, 3rd in bank 3, etc. After all pools have one screen
in them, the next screen creation will use pools 0 again (so now
we have two screens in one pools). This can continue until we reach
whatever the limit is: 15 screens? 20? Maybe calculated based on the
number of banks, so we can guarantee that when all memory is in use,
each screen will have a minimum of 23 lines. With a 130XE, this would
be a stupid amount of screens: 17 per bank for the 4 extended banks,
and at least 17 for bank 0 (so 85 of them, that's too many). Maybe
limit it to 28, which guarantees each screen can be 3 chunks (69
lines) tall?

The reason for the round-robin creation: Suppose you're only going to
use 3 screens (server, private, and one channel). It makes more sense
for each of those 3 to be in its own bank, so each one can grow to
16K (around 390 lines, or ~17 chunks). If we created them all in bank 0,
they'd compete with each other for memory, which is silly when there's
plenty of free RAM in the other (unused) banks.

Writing text to a screen consists of...

- Find a free line in the screen's pool (see below).
- Fill the line with the new text.
- Make the line's 'next' pointer point to the screen's 'head' pointed to.
- Make the screen's 'head' point to the new line.

The lines in a screen are stored as a textbook example of a linked list.

What happens if we're displaying a screen in one bank, and need
to add text to a screen that's in a different bank? Well, we have
to bankswitch to write to the new bank. But doing so will make that
bank replace the screen memory for the screen we were looking at. So,
bankswitching and writing has to take place during the vertical blank
interval, when ANTIC is done displaying the screen and no longer
reading from RAM.

*Careful*, without writing the code I don't yet know if there's enough
time in one VBLANK to write a huge (up to 510 bytes) IRC message in
one go. It'll be OK if it takes more than one frame, but not more than
maybe 4 or 5 (that'll make the app feel sluggish). Assembly optimization
is a must for this. Also, we don't have to wait for the VBLANK interrupt
to happen: we can start after the last visible scanline and work through
until just before the start of the first visible scanline on the next
frame.

Finding a free line:

- See if there's a free line in the pool (if the head of the free lines
  list is not null, and/or if the free lines count is not 0).
- If you find one, add it to the screen (see above), and remove it
  from the free list (make the pool's free_list point to whatever
  the line's 'next' pointed to). Also decrement the free lines count for
  the pool.
- If there isn't a free line, we have to 'steal' one from another
  screen in this pool. For now, just take the one with the most
  lines and steal its top line, and add it to the screen we're writing to.
  This means the screens "compete" and eat each other :)

'Stealing' means the screens will automatically balance, to some degree.
If you have 3 active channels in one bank, during busy periods the 3
screens will tend to be around the same size. If one channel goes quiet,
the other 2 will steal lines from it until it gets down to 23 lines,
then they'll start stealing from each other instead. Maybe the minimum
should be 46 or 69 lines (2 or 3 chunks), to avoid the scenario where
you leave the Atari connected while you sleep, and 2 busy channels ate
all the 6+ hour old text in the other, that came in an hour after you
went to bed?

Closing a screen:

When a screen is closed, its lines are returned to the free lines list
in the pool. Since they're already a linked list, all that's needed is
to add the screen's 'head' to the end of the pool's free lines list,
and add the screen's line count to the pool's free line count.

Displaying the screen:

All the screens share the same display list, which lives in main memory.

The display list has an LMS for every line. The top 23 lines are for
the screen, the bottom two are the status bar or edit box (always the
same address; stored in main menory).

The LMS operands get set like so:

Switch to the screen's bank, then...

Starting at the screen's 'head' line, and the last LMS (bottom of
23-line area of the DL), walk the linked list of lines (which are in
bottom-first order) and the display list (backwards).

If we're scrolled up, just keep walking as many lines as we're
scrolled up (e.g. 23 for one chunk).

When we've walked to the first (bottom-most) line to display (which will
be the 'head' one, if we weren't scrolled back), write its address as
the current LMS's 16-bit operand, then move on to the next line
and the next LMS...

Repeat until we hit the end of the screen (the line count), or we
hit 23 LMSes. At this point, we're done.

Ideally, we'll double-buffer the display list (2 of them, one
displaying while the other's being rewritten), and switch to the newly
modified one during the next VBLANK (just update SDLSTL/H and let
the OS do it). Note that we *don't* have to deal with banking in the
display list: we can only show one screen at a time, so we don't need
to bankswitch.

Switching screens, or scrolling back the current screen, will require
rebuilding the display list. There's no need to rebuild it every
frame (there'll be a 'dirty' flag that gets set when switching or
scrolling).

Scrolling the screen up (or down) is just a matter of setting the
screen's scroll height. It should *never* be set higher than the
screen's height, and probably the UI will increase by 23 for each
press of Start+Up. So if we have 30 lines, counting from 1 (top) to 30
(bottom), we're normally looking at lines 8 to 30. Scrolling up by
one chunk will show lines 1 to 7 at the bottom, then the rest of the
display (the top 2/3s or so) will all be the End Marker line, which
appears blank. At that point, it won't be possible to scroll again:
We're at 23, adding another 23 would exceed the height of 30 lines,
so the attempt is just ignored).


Memory layouts for typical machine sizes...

- A 48K 800 will only have pool 0, which will be either 16384 bytes
or ~390 lines (if DOS is booted) or 22784 bytes or ~542 lines without
DOS. This is enough to have 7 or 8 screens without about 3 chunks
(69 lines) apiece, which is better than the exising fixed-buffer code
manages.

- A 64K 800XL/1200XL/65XE/XEGS will only have pool 0, which will be
either 20480 bytes (~487 lines) with DOS, or 26880 bytes (~640 lines)
without DOS.

- A 128K 130XE will have pool 0 as the XL does, plus another 4 pools
of 16K each. That's 2048 lines (with DOS), which could be organized
as e.g. 10 screens of 200 lines each, or 16 with 128 lines each, or 20
with 100 lines each.

- A 256K upgraded XL with DOS will have 132K (135168), or about 3200
lines. For 512K, roughly twice that. To really take advantage of 256K,
you'll actually have to create enough screens so that all the banks
are used (16 screens with 16K each except the one in pool 0 gets
more). 512K will allow 32 screens, each in its own pool, with close
to 400 lines of scrollback in each (~17 chunks per screen!)


Milestones: things that will have to happen to make this a reality.

1. First and foremost, FNChat needs to go on a diet! Lots of stuff
   to rewrite in asm, to shrink it down. Currently, it's right at
   21K, but it's really more because rx_buf, tx_buf, and the font eat
   another 2K (in the screen memory area, in lieu of an 8th screen),
   plus all 3 display lists. No point optimizing the existing screen
   code for size, though, it's going to be replaced. See doc/diet.txt
   for details/progress on this.

2. Split the code/BSS/etc into high and low segments, so it lives from
   $2000 to $3FFF (low, 8K) and $8000 to $BFFF (high, 16K). This
   puts the primary screen memory area right where the XE bankswitching
   needs it to be, and gives a *total* size of 24K for the client,
   including the font, display lists, and all buffers (except ones
   located in very low memory, $400-$6FF; these are the config and the
   editbox, and can stay where they are).

   The font and buffers have to be moved out of the new screen address
   space (the banking window at $4000). I'm thinking the font will be
   just below the bank area at $3A00, 2 512-byte buffers below that
   at $3600, display lists below that, etc. Leave about 6K for 'low
   code', the cc65 stack, and the BSS.  Everything else will be in a
   'high code' (and possibly 'high BSS') segment, from $8000 to $BFFF.

3. Rip out all the existing screen code and replace it with a simplified
   form of the new scheme. To start with, only 1-5 pools in bank 0, but
   the display list modification code can be completed. The rest of
   the code (especially irc.c) is gonna need changes, because it
   "knows" that there are "always" 7 screens. This stage includes
   defining new hotkeys for screen numbers above 7, and making the
   status bar variable sized (show only the number of screens we
   have enough RAM for, based on the minimum size being 23 or 46
   lines).

4. Learn more than I currently know about bankswitching (I know the
   130XE, but what's the difference between a Rambo and a Compy Shop
   upgrade? I *think* I know how to detect all the available banks,
   but what about when the self-test and/or BASIC bits are being
   used for bank bits instead? Do I want to even try to support
   the Axlon and Mosaic upgrades for the 800?)

5. Do a version that supports a 130XE (4 extra banks only). Get it
   well tested, fix the inevitable issues that are going to happen.
   This version should also still be usable on 48K/64K. This will
   probably involve adding the memory detection to the config
   segment (along with copying the OS to RAM if possible). It'll
   deposit the pools array somewhere in screen memory, and the
   client will memcpy() that to its pools[] (before scr_init()
   is called). No point keeping all that startup code in memory
   the whole time the client runs.

6. Add support for more banks (detection and use).

Rest of the file is C structs that define the stuff above. This is
just hypothetical code (final implementation may look different).

/* if each pool is 16K, that's 512K, not bad. however, overhead. maybe
   limit this to something like 20 (128K extended = 16, plus the big
   pool in main bank, plus the potential smaller pools at $0700 and
   $d800. */
#define MAX_POOLS 32

/* with 512K, we get one screen per pool.
   with 256K, up to 2 screens per pool.
   with 128K, up to 4.
   with 64K, we only get 1 large pool and a couple small ones.
*/
#define MAX_SCREENS 32

/* 42 bytes per line */
typedef struct line_s {
  struct line_s *next;
  char data[40];
} line_t;

/* the end marker line is a line_t, but it lives outside of any pool and
   has a 'next' pointer that points to itself. */

/* sizeof(screen_t) is 33 bytes... */
typedef struct {
  char title[25];
  char status;
  char pool;
  int line_count; /* can be above 255 */
  int scrollback_pos;
  line_t *line_list; /* head of a linked list */
} screen_t;

screen_t screens[MAX_SCREENS]; /* array is 1023 bytes */

typedef struct {
  u16 start; /* 0 = not in use */
  u16 end;
  u8 bank; /* probably this is just the PORTB value */
  line_t *free_list; /* when this is null, the pool has no free lines */
} pool_t;

/* this array is sizeof(pool) * 9, so 288 bytes */
/* the code that builds this array (detects extended ram too),
   will live in the config segment. */
pool_t pools[MAX_POOLS];

/* this function is responsible for counting the usable lines (the ones
   that don't cross a 4K boundary) and arranging them in a linked list
   that includes all the usable ones. I suppose it should bzero() the
   memory first. */
void add_pool(u8 bank, u16 start, u16 end);