uxd.c: add overlong sequence detection; ver.rst: regenerate

author: B. Watson <urchlay@slackware.uk> 2024-12-17 22:47:36 -0500
committer: B. Watson <urchlay@slackware.uk> 2024-12-17 22:47:57 -0500
commit: f0e0a74cbf43d771075ad2d801197b8072d5b15c (patch)
tree: 71d2f41619aa4cc39487c850a59e97f90895669b /uxd.rst
parent: 548e7d04b4b2fa60b71615ed590be54016dac52d (diff)
download: uxd-f0e0a74cbf43d771075ad2d801197b8072d5b15c.tar.gz
1 files changed, 12 insertions, 11 deletions
diff --git a/uxd.rst b/uxd.rst
index 597084b..535177d 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -227,15 +227,21 @@ changed with the **-c** option (see above).
   Printable characters (except the space, U+0020) alternate between green and yellow.
 
 **purple**
-  Spaces and unprintable characters ("control" characters, newlines, tabs, etc).
-  These are printed as "visible" characters, e.g. ␣ for the space, ↵ for a newline.
-  Hopefully this is an improvement over the usual practice of printing these as periods, like
-  standard hex dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed
+  Spaces and unprintable characters ("control" characters, newlines,
+  tabs, etc).  These are printed as "visible" characters, e.g. ␣ for
+  the space, ↵ for a newline.  Hopefully this is an improvement over
+  the usual practice of printing these as periods, like standard hex
+  dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed
   as a purple letter B.
 
+  Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
+  as 2 or more bytes) are rendered as � (U+0FFD) in reverse video
+  purple.
+
 **red**
-  Invalid UTF-8 sequences. These are rendered with a red background, to make them
-  stand out. Examples of invalid sequences:
+  Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with
+  a red background, to make them stand out. Examples of invalid
+  sequences:
 
     - Prefix bytes (>= 0x80) which are not followed by the correct number of continuation
       bytes (with their high 2 bits set to **10**).
@@ -319,11 +325,6 @@ it'll just have lots of red in the output.
 BUGS
 ====
 
-**uxd** doesn't check for overlong UTF-8 encodings (e.g. a character
-that could be a 1-byte sequence, but is encoded as 2 or more).
-Sequences like this really should be colorized in red. Technically,
-this means **uxd** supports WTF-8, not UTF-8.
-
 There should be options and/or a config file to change the colors,
 rather than baking them into the binary.
author	B. Watson <urchlay@slackware.uk>	2024-12-17 22:47:36 -0500
committer	B. Watson <urchlay@slackware.uk>	2024-12-17 22:47:57 -0500
commit	f0e0a74cbf43d771075ad2d801197b8072d5b15c (patch)
tree	71d2f41619aa4cc39487c850a59e97f90895669b /uxd.rst
parent	548e7d04b4b2fa60b71615ed590be54016dac52d (diff)
download	uxd-f0e0a74cbf43d771075ad2d801197b8072d5b15c.tar.gz