From 72e6cbfc6a4b4606f11a6d5285de65238dc0dbd4 Mon Sep 17 00:00:00 2001 From: "B. Watson" Date: Wed, 9 Nov 2022 17:12:04 -0500 Subject: Added dla2csv, dla2img. Split up docs. --- NOTES.txt | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 NOTES.txt (limited to 'NOTES.txt') diff --git a/NOTES.txt b/NOTES.txt new file mode 100644 index 0000000..f6b0a72 --- /dev/null +++ b/NOTES.txt @@ -0,0 +1,54 @@ +dla.xex notes +------------- + +The way I use ca65 is unusual: I build Atari code with "-t none" +instead of "-t atari", and I generate my own XEX file headers (see +xex.inc) instead of using the cc65 linker. This is because cc65's +default atari linker script doesn't support multi-segment executables, +and it's a royal PITA to write a custom cc65 linker scripts. Also +because it makes ca65 behave more like Mac/65, which was my go-to +assembler back in the old days. + +It might be possible to optimize this a bit further, maybe shave a +few percent off the run time. drunkwalk.s contains the innermost loop, +which has been unrolled and cycle-counted. + +During generation, the ANTIC chip's DMA is disabled, to speed things +up. There wouldn't be anything to see anyway: the generation process +works with unpacked pixels (one per byte), so the ANTIC couldn't +display them properly. At the end, when all the particles are done, +the unpacked pixels are packed into bytes for display. Using unpacked +pixels is faster than doing all the shift-and-mask operations needed +for packed pixels, but it also uses a lot of memory (48K required, so +it won't run on my poor old 400). + +Packing the pixels is a slow process, takes about 0.6 seconds. Since +it happens only once at the end of a 3+ minute process, it's probably +not worth trying to optimize. See render.s. Also, at the start of +generation, 28K of memory has to be cleared, which takes 0.3 seconds. + +There might be a quick way to limit the particles' movement outside +the initial circle's radius. Right now, it's limited to a square area; +width and height are the diameter of the circle plus 10 pixels. The +corners of this square waste a lot of time; it'd be better to come up +with a way to do an octagon (the square with the corners cut off), +which shouldn't slow down the inner loop too much... I actually did +implement this, but it was too slow (the time spent in calculations +was longer than the time saved by doing them). + +Rather than calculate points on a circle in asm code, the tables of +points for the 4 circle sizes are pre-calculated by a perl script +and included in the executable verbatim. The tables bloat the code +some (2KB), but the speed boost is well worth it. Also, the graphics +mode used is "graphics 8", but in ANTIC narrow playfield mode, so +the X resolution is 256... meaning I don't need two bytes for the X +cursor position (which saves a good bit of time). The code that plots +pixels doesn't use CIO to do so (it writes directly to the screen +memory), which also saves time. There's no floating point math in the +generation process: if there were, the asm version wouldn't be all +that much faster than the BASIC one... + +It *does* use floating point to print integers (the default number of +particles in the prompt) and calculate the elapsed time in mmss.s. I +thought it would be easier to code that way; I'd forgot what a PITA +the FP ROM is. It works now, so I won't change it. -- cgit v1.2.3