aboutsummaryrefslogtreecommitdiff
path: root/uxd.rst
diff options
context:
space:
mode:
authorB. Watson <urchlay@slackware.uk>2024-12-18 05:47:07 -0500
committerB. Watson <urchlay@slackware.uk>2024-12-18 05:47:07 -0500
commitc205a7ea2a7171b61dae4ac51a3a251cceb1dde1 (patch)
tree58447b4934f93eb8cb48909fc1efc3b15c72c5ed /uxd.rst
parentf467fec27bc25d51020ce482750361c102417efb (diff)
downloaduxd-c205a7ea2a7171b61dae4ac51a3a251cceb1dde1.tar.gz
detect UTF-16 surrogates as bad, use red for overlong
Diffstat (limited to 'uxd.rst')
-rw-r--r--uxd.rst16
1 files changed, 10 insertions, 6 deletions
diff --git a/uxd.rst b/uxd.rst
index 535177d..1789efe 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -234,14 +234,10 @@ changed with the **-c** option (see above).
dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed
as a purple letter B.
- Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
- as 2 or more bytes) are rendered as � (U+0FFD) in reverse video
- purple.
-
**red**
Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with
- a red background, to make them stand out. Examples of invalid
- sequences:
+ a red background, to make them stand out. Invalid
+ sequences are:
- Prefix bytes (>= 0x80) which are not followed by the correct number of continuation
bytes (with their high 2 bits set to **10**).
@@ -250,8 +246,16 @@ changed with the **-c** option (see above).
- Truncated UTF-8 sequence at EOF.
+ - UTF-16 surrogates (codepoints U+D800 to U+DFFF).
+
- Codepoints above U+10FFFF, which are disallowed by RFC 3629.
+ - Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
+ as 2 or more bytes).
+
+ Each occurrence of any of the above will increment the "Bad
+ Sequences" count, if the **-i** option is used.
+
TERMINAL SUPPORT
================