aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--uxd.149
-rw-r--r--uxd.rst51
2 files changed, 78 insertions, 22 deletions
diff --git a/uxd.1 b/uxd.1
index 55c7b59..d9fb7d4 100644
--- a/uxd.1
+++ b/uxd.1
@@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
-.TH "UXD" 1 "2024-12-12" "0.0.1" "Urchlay's Utilities"
+.TH "UXD" 1 "2024-12-13" "0.0.1" "Urchlay's Utilities"
.SH NAME
uxd \- UTF-8 hex dumper
.SH SYNOPSIS
@@ -50,6 +50,10 @@ Output is written to standard output, which is normally a
terminal. It\(aqs assumed that the terminal supports ANSI\-style color and
UTF\-8. See \fBTERMINAL SUPPORT\fP below. If you want to pipe the output
to a pager, try \fBless \-R\fP\&.
+.SH OPTIONS
+.sp
+There are no options yet.
+.SH OUTPUT FORMAT
.sp
Each line of output consists of eighteen columns: the offset from the
start of the file (in hex; minimum 4 digits), 16 bytes of hex
@@ -60,9 +64,23 @@ The hex bytes and human\-readable data are colorized to make it obvious
which bytes make up each character. Since UTF\-8 is a variable\-width
encoding, this means that one character may be composed of up to
4 bytes.
-.SH OPTIONS
.sp
-There are no options yet.
+The hex bytes that make up one character are displayed in the same
+color, which alternates between yellow and green for successive
+characters. In addition, they have dashes instead of spaces between
+them. An example would be \fBc3\-b1\fP (for an ñ character).
+.sp
+The 16\-byte hex display always has an extra "spacer" column in the
+center. Normally this is a space, but if a multibyte character spans
+it, it will be a dash (so there\(aqll be two dashes: \fBc3\-\-b1\fP).
+.sp
+Since the output lines are always 16 hex bytes, multibyte characters
+can span two lines. When this happens, the character itself will be
+printed on the first line, along with the first byte(s) on hex. The
+last hex byte will be followed by a dash, and the next line of hex
+dump will have the remaining bytes (in the same color as the first
+bytes and character). This sounds complicated, but it\(aqs easy to
+understand once you see it a few times.
.SH EXAMPLE
.sp
It\(aqs hard to give a proper example, since man pages don\(aqt support
@@ -89,13 +107,14 @@ The colors are indicated by G/Y/P, for green, yellow, and purple. The
character above each letter is displayed in that color.
.sp
From the colorization, and from the dashes between the bytes, it\(aqs
-obvious that the "c2 a5" is the hex representation of the first ¥
+obvious that "c2 a5" is the hex representation of the first ¥
character, and that the ǥ is represented by "c7 a5".
.sp
The newline is displayed in purple because it\(aqs not a regular
printable character. Its human\-readable representation is ↵. Note
that if a regular ↵ character appears in the input, it\(aqll be
-rendered in either green or yellow (as a regular character).
+rendered in either green or yellow (so you can tell it\(aqs not just
+another newline).
.SH COLORS
.INDENT 0.0
.TP
@@ -133,19 +152,27 @@ Codepoints above U+10FFFF, which are disallowed by RFC 3629.
\fBuxd\fP should work with any modern terminal that supports color,
ANSI\-style escape sequences, Unicode, and UTF\-8 rendering.
.sp
-The author\(aqs testing is done primarily with \fBurxvt\fP(1). Other
-terminals aren\(aqt tested as often.
+The author\(aqs testing is done primarily with \fBurxvt\fP(1). Other
+terminals aren\(aqt tested as often. Some terminals may need UTF\-8
+enabled, if it\(aqs not on by default (e.g. xterm).
.sp
Known to work: urxvt, xterm, st, xfce4\-terminal, gnome\-terminal, kitty, the Linux console (but
see \fBFONTS\fP, below).
.sp
-Known \fBnot\fP to work: rxvt (doesn\(aqt support Unicode at all).
+Known \fBnot\fP to work: rxvt (doesn\(aqt support Unicode at all), and its
+derivatives such as aterm.
.SH FONTS
.sp
For the human\-readable column to display correctly, you\(aqll need a font
-with lots of glyphs. Try \fIDeja Vu Sans Mono\fP, \fISymbola\fP, \fIQuivira\fP\&.
-If you use urxvt, it searches for glyphs in multiple fonts, so you can
-use all of the above at once.
+with lots of glyphs. Try \fIDeja Vu Sans Mono\fP, \fISymbola\fP, or \fIQuivira\fP
+(although it\(aqs not really a terminal font). If you use urxvt, it
+searches for glyphs in multiple fonts, so you can use all of the above
+at once.
+.sp
+Any glyph your font lacks, you\(aqll see as a dotted box, or perhaps
+a solid block. This isn\(aqt something \fBuxd\fP can do anything about;
+you\(aqll have to use a different font, or (if you use urxvt) add another
+font to your URxvt*font resource.
.sp
The Linux console is capable of rendering UTF\-8, but it\(aqs incapable
of displaying more than 512 glyphs. Most console fonts only define
diff --git a/uxd.rst b/uxd.rst
index 8aa04fe..35bf9eb 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -38,6 +38,14 @@ terminal. It's assumed that the terminal supports ANSI-style color and
UTF-8. See **TERMINAL SUPPORT** below. If you want to pipe the output
to a pager, try **less -R**.
+OPTIONS
+=======
+
+There are no options yet.
+
+OUTPUT FORMAT
+=============
+
Each line of output consists of eighteen columns: the offset from the
start of the file (in hex; minimum 4 digits), 16 bytes of hex
data (or empty cells, if the last line of the dump is for fewer than
@@ -48,10 +56,22 @@ which bytes make up each character. Since UTF-8 is a variable-width
encoding, this means that one character may be composed of up to
4 bytes.
-OPTIONS
-=======
+The hex bytes that make up one character are displayed in the same
+color, which alternates between yellow and green for successive
+characters. In addition, they have dashes instead of spaces between
+them. An example would be **c3-b1** (for an ñ character).
-There are no options yet.
+The 16-byte hex display always has an extra "spacer" column in the
+center. Normally this is a space, but if a multibyte character spans
+it, it will be a dash (so there'll be two dashes: **c3--b1**).
+
+Since the output lines are always 16 hex bytes, multibyte characters
+can span two lines. When this happens, the character itself will be
+printed on the first line, along with the first byte(s) on hex. The
+last hex byte will be followed by a dash, and the next line of hex
+dump will have the remaining bytes (in the same color as the first
+bytes and character). This sounds complicated, but it's easy to
+understand once you see it a few times.
EXAMPLE
=======
@@ -72,13 +92,14 @@ The colors are indicated by G/Y/P, for green, yellow, and purple. The
character above each letter is displayed in that color.
From the colorization, and from the dashes between the bytes, it's
-obvious that the "c2 a5" is the hex representation of the first ¥
+obvious that "c2 a5" is the hex representation of the first ¥
character, and that the ǥ is represented by "c7 a5".
The newline is displayed in purple because it's not a regular
printable character. Its human-readable representation is ↵. Note
that if a regular ↵ character appears in the input, it'll be
-rendered in either green or yellow (as a regular character).
+rendered in either green or yellow (so you can tell it's not just
+another newline).
COLORS
======
@@ -112,21 +133,29 @@ TERMINAL SUPPORT
**uxd** should work with any modern terminal that supports color,
ANSI-style escape sequences, Unicode, and UTF-8 rendering.
-The author's testing is done primarily with **urxvt**\(1). Other
-terminals aren't tested as often.
+The author's testing is done primarily with **urxvt**\(1). Other
+terminals aren't tested as often. Some terminals may need UTF-8
+enabled, if it's not on by default (e.g. xterm).
Known to work: urxvt, xterm, st, xfce4-terminal, gnome-terminal, kitty, the Linux console (but
see **FONTS**, below).
-Known **not** to work: rxvt (doesn't support Unicode at all).
+Known **not** to work: rxvt (doesn't support Unicode at all), and its
+derivatives such as aterm.
FONTS
=====
For the human-readable column to display correctly, you'll need a font
-with lots of glyphs. Try *Deja Vu Sans Mono*, *Symbola*, *Quivira*.
-If you use urxvt, it searches for glyphs in multiple fonts, so you can
-use all of the above at once.
+with lots of glyphs. Try *Deja Vu Sans Mono*, *Symbola*, or *Quivira*
+(although it's not really a terminal font). If you use urxvt, it
+searches for glyphs in multiple fonts, so you can use all of the above
+at once.
+
+Any glyph your font lacks, you'll see as a dotted box, or perhaps
+a solid block. This isn't something **uxd** can do anything about;
+you'll have to use a different font, or (if you use urxvt) add another
+font to your URxvt*font resource.
The Linux console is capable of rendering UTF-8, but it's incapable
of displaying more than 512 glyphs. Most console fonts only define