diff options
author | B. Watson <urchlay@slackware.uk> | 2024-12-18 05:47:07 -0500 |
---|---|---|
committer | B. Watson <urchlay@slackware.uk> | 2024-12-18 05:47:07 -0500 |
commit | c205a7ea2a7171b61dae4ac51a3a251cceb1dde1 (patch) | |
tree | 58447b4934f93eb8cb48909fc1efc3b15c72c5ed /uxd.1 | |
parent | f467fec27bc25d51020ce482750361c102417efb (diff) | |
download | uxd-c205a7ea2a7171b61dae4ac51a3a251cceb1dde1.tar.gz |
detect UTF-16 surrogates as bad, use red for overlong
Diffstat (limited to 'uxd.1')
-rw-r--r-- | uxd.1 | 18 |
1 files changed, 11 insertions, 7 deletions
@@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. -.TH "UXD" 1 "2024-12-17" "0.2.1" "Urchlay's Utilities" +.TH "UXD" 1 "2024-12-18" "0.2.1" "Urchlay's Utilities" .SH NAME uxd \- UTF-8 hex dumper .SH SYNOPSIS @@ -276,15 +276,11 @@ the space, ↵ for a newline. Hopefully this is an improvement over the usual practice of printing these as periods, like standard hex dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed as a purple letter B. -.sp -Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded -as 2 or more bytes) are rendered as � (U+0FFD) in reverse video -purple. .TP .B \fBred\fP Invalid UTF\-8 sequences. These are rendered as � (U+0FFD) with -a red background, to make them stand out. Examples of invalid -sequences: +a red background, to make them stand out. Invalid +sequences are: .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 @@ -296,8 +292,16 @@ Continuation bytes that aren\(aqt preceded by a valid prefix byte. .IP \(bu 2 Truncated UTF\-8 sequence at EOF. .IP \(bu 2 +UTF\-16 surrogates (codepoints U+D800 to U+DFFF). +.IP \(bu 2 Codepoints above U+10FFFF, which are disallowed by RFC 3629. +.IP \(bu 2 +Overlong encodings (e.g. codepoints U+0000 to U+007F encoded +as 2 or more bytes). .UNINDENT +.sp +Each occurrence of any of the above will increment the "Bad +Sequences" count, if the \fB\-i\fP option is used. .UNINDENT .UNINDENT .UNINDENT |