diff options
Diffstat (limited to 'uxd.rst')
-rw-r--r-- | uxd.rst | 16 |
1 files changed, 10 insertions, 6 deletions
@@ -234,14 +234,10 @@ changed with the **-c** option (see above). dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed as a purple letter B. - Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded - as 2 or more bytes) are rendered as � (U+0FFD) in reverse video - purple. - **red** Invalid UTF-8 sequences. These are rendered as � (U+0FFD) with - a red background, to make them stand out. Examples of invalid - sequences: + a red background, to make them stand out. Invalid + sequences are: - Prefix bytes (>= 0x80) which are not followed by the correct number of continuation bytes (with their high 2 bits set to **10**). @@ -250,8 +246,16 @@ changed with the **-c** option (see above). - Truncated UTF-8 sequence at EOF. + - UTF-16 surrogates (codepoints U+D800 to U+DFFF). + - Codepoints above U+10FFFF, which are disallowed by RFC 3629. + - Overlong encodings (e.g. codepoints U+0000 to U+007F encoded + as 2 or more bytes). + + Each occurrence of any of the above will increment the "Bad + Sequences" count, if the **-i** option is used. + TERMINAL SUPPORT ================ |