From c205a7ea2a7171b61dae4ac51a3a251cceb1dde1 Mon Sep 17 00:00:00 2001
From: "B. Watson" <urchlay@slackware.uk>
Date: Wed, 18 Dec 2024 05:47:07 -0500
Subject: detect UTF-16 surrogates as bad, use red for overlong

---
 uxd.rst | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

(limited to 'uxd.rst')

diff --git a/uxd.rst b/uxd.rst
index 535177d..1789efe 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -234,14 +234,10 @@ changed with the **-c** option (see above).
   dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed
   as a purple letter B.
 
-  Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
-  as 2 or more bytes) are rendered as � (U+0FFD) in reverse video
-  purple.
-
 **red**
   Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with
-  a red background, to make them stand out. Examples of invalid
-  sequences:
+  a red background, to make them stand out. Invalid
+  sequences are:
 
     - Prefix bytes (>= 0x80) which are not followed by the correct number of continuation
       bytes (with their high 2 bits set to **10**).
@@ -250,8 +246,16 @@ changed with the **-c** option (see above).
 
     - Truncated UTF-8 sequence at EOF.
 
+    - UTF-16 surrogates (codepoints U+D800 to U+DFFF).
+
     - Codepoints above U+10FFFF, which are disallowed by RFC 3629.
 
+    - Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
+      as 2 or more bytes).
+
+    Each occurrence of any of the above will increment the "Bad
+    Sequences" count, if the **-i** option is used.
+
 TERMINAL SUPPORT
 ================
 
-- 
cgit v1.2.3