From f0e0a74cbf43d771075ad2d801197b8072d5b15c Mon Sep 17 00:00:00 2001 From: "B. Watson" Date: Tue, 17 Dec 2024 22:47:36 -0500 Subject: uxd.c: add overlong sequence detection; ver.rst: regenerate --- uxd.rst | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'uxd.rst') diff --git a/uxd.rst b/uxd.rst index 597084b..535177d 100644 --- a/uxd.rst +++ b/uxd.rst @@ -227,15 +227,21 @@ changed with the **-c** option (see above). Printable characters (except the space, U+0020) alternate between green and yellow. **purple** - Spaces and unprintable characters ("control" characters, newlines, tabs, etc). - These are printed as "visible" characters, e.g. ␣ for the space, ↵ for a newline. - Hopefully this is an improvement over the usual practice of printing these as periods, like - standard hex dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed + Spaces and unprintable characters ("control" characters, newlines, + tabs, etc). These are printed as "visible" characters, e.g. ␣ for + the space, ↵ for a newline. Hopefully this is an improvement over + the usual practice of printing these as periods, like standard hex + dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed as a purple letter B. + Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded + as 2 or more bytes) are rendered as � (U+0FFD) in reverse video + purple. + **red** - Invalid UTF-8 sequences. These are rendered with a red background, to make them - stand out. Examples of invalid sequences: + Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with + a red background, to make them stand out. Examples of invalid + sequences: - Prefix bytes (>= 0x80) which are not followed by the correct number of continuation bytes (with their high 2 bits set to **10**). @@ -319,11 +325,6 @@ it'll just have lots of red in the output. BUGS ==== -**uxd** doesn't check for overlong UTF-8 encodings (e.g. a character -that could be a 1-byte sequence, but is encoded as 2 or more). -Sequences like this really should be colorized in red. Technically, -this means **uxd** supports WTF-8, not UTF-8. - There should be options and/or a config file to change the colors, rather than baking them into the binary. -- cgit v1.2.3