1 files changed, 198 insertions, 0 deletions
diff --git a/uxd.1 b/uxd.1
new file mode 100644
index 0000000..87886b3
--- /dev/null
+++ b/uxd.1
@@ -0,0 +1,198 @@
+.\" Man page generated from reStructuredText.
+.
+.
+.nr rst2man-indent-level 0
+.
+.de1 rstReportMargin
+\\$1 \\n[an-margin]
+level \\n[rst2man-indent-level]
+level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
+-
+\\n[rst2man-indent0]
+\\n[rst2man-indent1]
+\\n[rst2man-indent2]
+..
+.de1 INDENT
+.\" .rstReportMargin pre:
+. RS \\$1
+. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
+. nr rst2man-indent-level +1
+.\" .rstReportMargin post:
+..
+.de UNINDENT
+. RE
+.\" indent \\n[an-margin]
+.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
+.nr rst2man-indent-level -1
+.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
+.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
+..
+.TH "UXD" 1 "2024-12-12" "0.0.1" "Urchlay's Utilities"
+.SH NAME
+uxd \- UTF-8 hex dumper
+.SH SYNOPSIS
+.sp
+uxd [\fIfile\fP | \fI\-\fP]
+.SH DESCRIPTION
+.sp
+\fBuxd\fP is a hex dump utility that\(aqs aware of UTF\-8 multibyte sequence
+semantics.
+.sp
+Input is read from \fIfile\fP, or standard input if \fIfile\fP is missing or
+given as \fB\-\fP\&. The input is treated as UTF\-8 encoded Unicode. Since
+ASCII is a subset, \fBuxd\fP works fine on plain ASCII files too. Other
+encodings such as UTF\-16, ISO\-8859\-\fI, Shift\-JIS, etc, can be used, but
+**uxd*\fP won\(aqt handle these any better than a regular hex\-dump utility
+such as \fBxxd\fP\&.
+.sp
+Output is written to standard output, which is normally a
+terminal. It\(aqs assumed that the terminal supports ANSI\-style color and
+UTF\-8. See \fBTERMINAL SUPPORT\fP below.
+.sp
+Each line of output consists of eighteen columns: the offset from the
+start of the file (in hex; minimum 4 digits), 16 bytes of hex
+data (or empty cells, if the last line of the dump is for fewer than
+16 bytes), and the human\-readable form of the same data.
+.sp
+The hex bytes and human\-readable data are colorized to make it obvious
+which bytes make up each character. Since UTF\-8 is a variable\-width
+encoding, this means that one character may be composed of up to
+4 bytes.
+.SH OPTIONS
+.sp
+There are no options yet.
+.SH EXAMPLE
+.sp
+It\(aqs hard to give a proper example, since man pages don\(aqt support
+color. You\(aqll have to use your imagination. Also, this section of
+the man page requires your man command to support UTF\-8 embedded in
+the man page. If the example looks mangled, try viewing the source
+(uxd.rst) in a text editor.
+.sp
+Japanese characters:
+.INDENT 0.0
+.INDENT 3.5
+.sp
+.nf
+.ft C
+$ echo ¥ǥ£¥ | uxd
+0000: c2 a5 c7 a5 c2 a3 c2 a5  0a                       ¥ǥ£¥↵
+      GG GG YY YY GG GG YY YY  PP                       GYGYP
+.ft P
+.fi
+.UNINDENT
+.UNINDENT
+.sp
+The colors are indicated by G/Y/P, for green, yellow, and purple. The
+character above each letter is displayed in that color.
+.sp
+From the colorization, it\(aqs obvious that the "c2 a5" is the hex
+representation of the first ¥ character, and that the ǥ is
+represented by "c7 a5".
+.sp
+The newline is displayed in purple because it\(aqs not a regular
+printable character. Its human\-readable representation is ↵. Note
+that if a regular ↵ character appears in the input, it\(aqll be
+rendered in either green or yellow (as a regular character).
+.SH COLORS
+.INDENT 0.0
+.TP
+.B \fBgreen\fP, \fByellow\fP
+Printable characters (except the space, U+0020) alternate between green and yellow.
+.TP
+.B \fBpurple\fP
+Spaces and unprintable characters ("control" characters, newlines, tabs, etc).
+These are printed as "visible" characters, e.g. ␣ for the space, ↵ for a newline.
+This is an improvement over the usual practice of printing these as periods, like
+standard hex dumpers do.
+.TP
+.B \fBred\fP
+Invalid UTF\-8 sequences. These are rendered with a red foreground, to make them
+stand out. Examples of invalid sequences:
+.INDENT 7.0
+.INDENT 3.5
+.INDENT 0.0
+.IP \(bu 2
+Prefix bytes (>= 0x80) which are not followed by the correct number of continuation
+bytes (with their high 2 bits set to \fB10\fP).
+.IP \(bu 2
+Continuation bytes that aren\(aqt preceded by a valid prefix byte.
+.IP \(bu 2
+Truncated UTF\-8 sequence at EOF.
+.UNINDENT
+.UNINDENT
+.UNINDENT
+.UNINDENT
+.SH TERMINAL SUPPORT
+.sp
+\fBuxd\fP should work with any modern terminal that supports color,
+ANSI\-style escape sequences, Unicode, and UTF\-8 rendering.
+.sp
+The author\(aqs testing is done primarily with \fBurxvt\fP(1).  Other
+terminals aren\(aqt tested as often.
+.sp
+Known to work: urxvt, xterm, st, xfce4\-terminal, gnome\-terminal, the Linux console (but
+see \fBFONTS\fP, below).
+.sp
+Known \fBnot\fP to work: rxvt (doesn\(aqt support Unicode at all).
+.SH FONTS
+.sp
+For the human\-readable column to display correctly, you\(aqll need a font
+with lots of glyphs. Try \fIDeja Vu Sans Mono\fP, \fISymbola\fP, \fIQuivira\fP\&.
+If you use urxvt, it searches for glyphs in multiple fonts, so you can
+use all of the above at once.
+.sp
+The Linux console is capable of rendering UTF\-8, but it\(aqs incapable
+of displaying more than 512 glyphs. Most console fonts only define
+256, since using more than 256 means the console won\(aqt be able to
+do bold. Expect to see lots of solid or dotted boxes. This isn\(aqt
+specifically a problem with \fBuxd\fP\&.
+.SH FILES
+.sp
+\fBuxd\fP doesn\(aqt read any files other than the input file, and doesn\(aqt write to
+any files other than standard output. There\(aqs no config file.
+.SH ENVIRONMENT
+.sp
+\fBuxd\fP doesn\(aqt read anything from the environment. It\(aqs \fInot\fP necessary to
+have a UTF\-8 locale set in e.g. \fBLANG\fP or \fBLC_ALL\fP\&. Also, the \fBTERM\fP
+variable is not used.
+.SH EXIT STATUS
+.sp
+Zero for success, non\-zero for failure.
+.sp
+Failure status should only be returned if \fBuxd\fP failed to open the
+input file. Invalid input (non\-UTF\-8) doesn\(aqt count as an error;
+it\(aqll just have lots of red in the output.
+.SH BUGS
+.sp
+\fBuxd\fP doesn\(aqt check for overlong UTF\-8 encodings (e.g. a character
+that could be a 1\-byte sequence, but is encoded as 2 or more).
+Sequences like this really should be colorized in red. Technically,
+this means \fBuxd\fP supports WTF\-8, not UTF\-8.
+.sp
+RFC 3629 doesn\(aqt allow UTF\-8 to use codepoints above U+10FFFF. 4\-byte
+sequences can support codepoints U+110000 to U+1FFFFF, which are not
+valid Unicode. If these occur in the input, \fBuxd\fP should colorize
+them in red, but it doesn\(aqt (yet).
+.sp
+There should be options and/or a config file to change the colors,
+rather than baking them into the binary.
+.sp
+Combining characters are not handled well. Or at all, really: the 2
+characters being combined will have an ANSI color code in between.
+urxvt at least ignores the color code, so the composite character
+displays in the color of the first (non\-combining) character. I\(aqm not
+sure what a better solution would be...
+.SH COPYRIGHT
+.sp
+Licensed under the WTFPL. See \fI\%http://www.wtfpl.net/txt/copying/\fP for details.
+.SH AUTHORS
+.INDENT 0.0
+.IP B. 3
+Watson <\fI\%urchlay@slackware.uk\fP>.
+.UNINDENT
+.SH SEE ALSO
+.sp
+xxd(1), bvi(1), utf\-8(7), unicode(7)
+.\" Generated by docutils manpage writer.
+.