aboutsummaryrefslogtreecommitdiff
path: root/uxd.1
diff options
context:
space:
mode:
authorB. Watson <urchlay@slackware.uk>2024-12-18 05:47:07 -0500
committerB. Watson <urchlay@slackware.uk>2024-12-18 05:47:07 -0500
commitc205a7ea2a7171b61dae4ac51a3a251cceb1dde1 (patch)
tree58447b4934f93eb8cb48909fc1efc3b15c72c5ed /uxd.1
parentf467fec27bc25d51020ce482750361c102417efb (diff)
downloaduxd-c205a7ea2a7171b61dae4ac51a3a251cceb1dde1.tar.gz
detect UTF-16 surrogates as bad, use red for overlong
Diffstat (limited to 'uxd.1')
-rw-r--r--uxd.118
1 files changed, 11 insertions, 7 deletions
diff --git a/uxd.1 b/uxd.1
index ca39178..90b23a3 100644
--- a/uxd.1
+++ b/uxd.1
@@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
-.TH "UXD" 1 "2024-12-17" "0.2.1" "Urchlay's Utilities"
+.TH "UXD" 1 "2024-12-18" "0.2.1" "Urchlay's Utilities"
.SH NAME
uxd \- UTF-8 hex dumper
.SH SYNOPSIS
@@ -276,15 +276,11 @@ the space, ↵ for a newline. Hopefully this is an improvement over
the usual practice of printing these as periods, like standard hex
dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed
as a purple letter B.
-.sp
-Note: Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
-as 2 or more bytes) are rendered as � (U+0FFD) in reverse video
-purple.
.TP
.B \fBred\fP
Invalid UTF\-8 sequences. These are rendered as � (U+0FFD) with
-a red background, to make them stand out. Examples of invalid
-sequences:
+a red background, to make them stand out. Invalid
+sequences are:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
@@ -296,8 +292,16 @@ Continuation bytes that aren\(aqt preceded by a valid prefix byte.
.IP \(bu 2
Truncated UTF\-8 sequence at EOF.
.IP \(bu 2
+UTF\-16 surrogates (codepoints U+D800 to U+DFFF).
+.IP \(bu 2
Codepoints above U+10FFFF, which are disallowed by RFC 3629.
+.IP \(bu 2
+Overlong encodings (e.g. codepoints U+0000 to U+007F encoded
+as 2 or more bytes).
.UNINDENT
+.sp
+Each occurrence of any of the above will increment the "Bad
+Sequences" count, if the \fB\-i\fP option is used.
.UNINDENT
.UNINDENT
.UNINDENT