diff options
author | B. Watson <urchlay@slackware.uk> | 2024-12-18 05:47:07 -0500 |
---|---|---|
committer | B. Watson <urchlay@slackware.uk> | 2024-12-18 05:47:07 -0500 |
commit | c205a7ea2a7171b61dae4ac51a3a251cceb1dde1 (patch) | |
tree | 58447b4934f93eb8cb48909fc1efc3b15c72c5ed /uxd.rst | |
parent | f467fec27bc25d51020ce482750361c102417efb (diff) | |
download | uxd-c205a7ea2a7171b61dae4ac51a3a251cceb1dde1.tar.gz |
detect UTF-16 surrogates as bad, use red for overlong
Diffstat (limited to 'uxd.rst')
-rw-r--r-- | uxd.rst | 16 |
1 files changed, 10 insertions, 6 deletions
@@ -234,14 +234,10 @@ changed with the **-c** option (see above). dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed as a purple letter B. - Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded - as 2 or more bytes) are rendered as � (U+0FFD) in reverse video - purple. - **red** Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with - a red background, to make them stand out. Examples of invalid - sequences: + a red background, to make them stand out. Invalid + sequences are: - Prefix bytes (>= 0x80) which are not followed by the correct number of continuation bytes (with their high 2 bits set to **10**). @@ -250,8 +246,16 @@ changed with the **-c** option (see above). - Truncated UTF-8 sequence at EOF. + - UTF-16 surrogates (codepoints U+D800 to U+DFFF). + - Codepoints above U+10FFFF, which are disallowed by RFC 3629. + - Overlong encodings (e.g. codepoints U+0000 to U+007F encoded + as 2 or more bytes). + + Each occurrence of any of the above will increment the "Bad + Sequences" count, if the **-i** option is used. + TERMINAL SUPPORT ================ |