From 72e6cbfc6a4b4606f11a6d5285de65238dc0dbd4 Mon Sep 17 00:00:00 2001
From: "B. Watson" <urchlay@slackware.uk>
Date: Wed, 9 Nov 2022 17:12:04 -0500
Subject: Added dla2csv, dla2img. Split up docs.

---
 NOTES.txt | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 NOTES.txt

(limited to 'NOTES.txt')

diff --git a/NOTES.txt b/NOTES.txt
new file mode 100644
index 0000000..f6b0a72
--- /dev/null
+++ b/NOTES.txt
@@ -0,0 +1,54 @@
+dla.xex notes
+-------------
+
+The way I use ca65 is unusual: I build Atari code with "-t none"
+instead of "-t atari", and I generate my own XEX file headers (see
+xex.inc) instead of using the cc65 linker. This is because cc65's
+default atari linker script doesn't support multi-segment executables,
+and it's a royal PITA to write a custom cc65 linker scripts. Also
+because it makes ca65 behave more like Mac/65, which was my go-to
+assembler back in the old days.
+
+It might be possible to optimize this a bit further, maybe shave a
+few percent off the run time. drunkwalk.s contains the innermost loop,
+which has been unrolled and cycle-counted.
+
+During generation, the ANTIC chip's DMA is disabled, to speed things
+up. There wouldn't be anything to see anyway: the generation process
+works with unpacked pixels (one per byte), so the ANTIC couldn't
+display them properly. At the end, when all the particles are done,
+the unpacked pixels are packed into bytes for display. Using unpacked
+pixels is faster than doing all the shift-and-mask operations needed
+for packed pixels, but it also uses a lot of memory (48K required, so
+it won't run on my poor old 400).
+
+Packing the pixels is a slow process, takes about 0.6 seconds. Since
+it happens only once at the end of a 3+ minute process, it's probably
+not worth trying to optimize. See render.s. Also, at the start of
+generation, 28K of memory has to be cleared, which takes 0.3 seconds.
+
+There might be a quick way to limit the particles' movement outside
+the initial circle's radius. Right now, it's limited to a square area;
+width and height are the diameter of the circle plus 10 pixels. The
+corners of this square waste a lot of time; it'd be better to come up
+with a way to do an octagon (the square with the corners cut off),
+which shouldn't slow down the inner loop too much... I actually did
+implement this, but it was too slow (the time spent in calculations
+was longer than the time saved by doing them).
+
+Rather than calculate points on a circle in asm code, the tables of
+points for the 4 circle sizes are pre-calculated by a perl script
+and included in the executable verbatim. The tables bloat the code
+some (2KB), but the speed boost is well worth it. Also, the graphics
+mode used is "graphics 8", but in ANTIC narrow playfield mode, so
+the X resolution is 256... meaning I don't need two bytes for the X
+cursor position (which saves a good bit of time). The code that plots
+pixels doesn't use CIO to do so (it writes directly to the screen
+memory), which also saves time. There's no floating point math in the
+generation process: if there were, the asm version wouldn't be all
+that much faster than the BASIC one...
+
+It *does* use floating point to print integers (the default number of
+particles in the prompt) and calculate the elapsed time in mmss.s. I
+thought it would be easier to code that way; I'd forgot what a PITA
+the FP ROM is. It works now, so I won't change it.
-- 
cgit v1.2.3