diff options
Diffstat (limited to 'uxd.1')
-rw-r--r-- | uxd.1 | 198 |
1 files changed, 198 insertions, 0 deletions
@@ -0,0 +1,198 @@ +.\" Man page generated from reStructuredText. +. +. +.nr rst2man-indent-level 0 +. +.de1 rstReportMargin +\\$1 \\n[an-margin] +level \\n[rst2man-indent-level] +level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] +- +\\n[rst2man-indent0] +\\n[rst2man-indent1] +\\n[rst2man-indent2] +.. +.de1 INDENT +.\" .rstReportMargin pre: +. RS \\$1 +. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] +. nr rst2man-indent-level +1 +.\" .rstReportMargin post: +.. +.de UNINDENT +. RE +.\" indent \\n[an-margin] +.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] +.nr rst2man-indent-level -1 +.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] +.in \\n[rst2man-indent\\n[rst2man-indent-level]]u +.. +.TH "UXD" 1 "2024-12-12" "0.0.1" "Urchlay's Utilities" +.SH NAME +uxd \- UTF-8 hex dumper +.SH SYNOPSIS +.sp +uxd [\fIfile\fP | \fI\-\fP] +.SH DESCRIPTION +.sp +\fBuxd\fP is a hex dump utility that\(aqs aware of UTF\-8 multibyte sequence +semantics. +.sp +Input is read from \fIfile\fP, or standard input if \fIfile\fP is missing or +given as \fB\-\fP\&. The input is treated as UTF\-8 encoded Unicode. Since +ASCII is a subset, \fBuxd\fP works fine on plain ASCII files too. Other +encodings such as UTF\-16, ISO\-8859\-\fI, Shift\-JIS, etc, can be used, but +**uxd*\fP won\(aqt handle these any better than a regular hex\-dump utility +such as \fBxxd\fP\&. +.sp +Output is written to standard output, which is normally a +terminal. It\(aqs assumed that the terminal supports ANSI\-style color and +UTF\-8. See \fBTERMINAL SUPPORT\fP below. +.sp +Each line of output consists of eighteen columns: the offset from the +start of the file (in hex; minimum 4 digits), 16 bytes of hex +data (or empty cells, if the last line of the dump is for fewer than +16 bytes), and the human\-readable form of the same data. +.sp +The hex bytes and human\-readable data are colorized to make it obvious +which bytes make up each character. Since UTF\-8 is a variable\-width +encoding, this means that one character may be composed of up to +4 bytes. +.SH OPTIONS +.sp +There are no options yet. +.SH EXAMPLE +.sp +It\(aqs hard to give a proper example, since man pages don\(aqt support +color. You\(aqll have to use your imagination. Also, this section of +the man page requires your man command to support UTF\-8 embedded in +the man page. If the example looks mangled, try viewing the source +(uxd.rst) in a text editor. +.sp +Japanese characters: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +$ echo ¥ǥ£¥ | uxd +0000: c2 a5 c7 a5 c2 a3 c2 a5 0a ¥ǥ£¥↵ + GG GG YY YY GG GG YY YY PP GYGYP +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +The colors are indicated by G/Y/P, for green, yellow, and purple. The +character above each letter is displayed in that color. +.sp +From the colorization, it\(aqs obvious that the "c2 a5" is the hex +representation of the first ¥ character, and that the ǥ is +represented by "c7 a5". +.sp +The newline is displayed in purple because it\(aqs not a regular +printable character. Its human\-readable representation is ↵. Note +that if a regular ↵ character appears in the input, it\(aqll be +rendered in either green or yellow (as a regular character). +.SH COLORS +.INDENT 0.0 +.TP +.B \fBgreen\fP, \fByellow\fP +Printable characters (except the space, U+0020) alternate between green and yellow. +.TP +.B \fBpurple\fP +Spaces and unprintable characters ("control" characters, newlines, tabs, etc). +These are printed as "visible" characters, e.g. ␣ for the space, ↵ for a newline. +This is an improvement over the usual practice of printing these as periods, like +standard hex dumpers do. +.TP +.B \fBred\fP +Invalid UTF\-8 sequences. These are rendered with a red foreground, to make them +stand out. Examples of invalid sequences: +.INDENT 7.0 +.INDENT 3.5 +.INDENT 0.0 +.IP \(bu 2 +Prefix bytes (>= 0x80) which are not followed by the correct number of continuation +bytes (with their high 2 bits set to \fB10\fP). +.IP \(bu 2 +Continuation bytes that aren\(aqt preceded by a valid prefix byte. +.IP \(bu 2 +Truncated UTF\-8 sequence at EOF. +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT +.SH TERMINAL SUPPORT +.sp +\fBuxd\fP should work with any modern terminal that supports color, +ANSI\-style escape sequences, Unicode, and UTF\-8 rendering. +.sp +The author\(aqs testing is done primarily with \fBurxvt\fP(1). Other +terminals aren\(aqt tested as often. +.sp +Known to work: urxvt, xterm, st, xfce4\-terminal, gnome\-terminal, the Linux console (but +see \fBFONTS\fP, below). +.sp +Known \fBnot\fP to work: rxvt (doesn\(aqt support Unicode at all). +.SH FONTS +.sp +For the human\-readable column to display correctly, you\(aqll need a font +with lots of glyphs. Try \fIDeja Vu Sans Mono\fP, \fISymbola\fP, \fIQuivira\fP\&. +If you use urxvt, it searches for glyphs in multiple fonts, so you can +use all of the above at once. +.sp +The Linux console is capable of rendering UTF\-8, but it\(aqs incapable +of displaying more than 512 glyphs. Most console fonts only define +256, since using more than 256 means the console won\(aqt be able to +do bold. Expect to see lots of solid or dotted boxes. This isn\(aqt +specifically a problem with \fBuxd\fP\&. +.SH FILES +.sp +\fBuxd\fP doesn\(aqt read any files other than the input file, and doesn\(aqt write to +any files other than standard output. There\(aqs no config file. +.SH ENVIRONMENT +.sp +\fBuxd\fP doesn\(aqt read anything from the environment. It\(aqs \fInot\fP necessary to +have a UTF\-8 locale set in e.g. \fBLANG\fP or \fBLC_ALL\fP\&. Also, the \fBTERM\fP +variable is not used. +.SH EXIT STATUS +.sp +Zero for success, non\-zero for failure. +.sp +Failure status should only be returned if \fBuxd\fP failed to open the +input file. Invalid input (non\-UTF\-8) doesn\(aqt count as an error; +it\(aqll just have lots of red in the output. +.SH BUGS +.sp +\fBuxd\fP doesn\(aqt check for overlong UTF\-8 encodings (e.g. a character +that could be a 1\-byte sequence, but is encoded as 2 or more). +Sequences like this really should be colorized in red. Technically, +this means \fBuxd\fP supports WTF\-8, not UTF\-8. +.sp +RFC 3629 doesn\(aqt allow UTF\-8 to use codepoints above U+10FFFF. 4\-byte +sequences can support codepoints U+110000 to U+1FFFFF, which are not +valid Unicode. If these occur in the input, \fBuxd\fP should colorize +them in red, but it doesn\(aqt (yet). +.sp +There should be options and/or a config file to change the colors, +rather than baking them into the binary. +.sp +Combining characters are not handled well. Or at all, really: the 2 +characters being combined will have an ANSI color code in between. +urxvt at least ignores the color code, so the composite character +displays in the color of the first (non\-combining) character. I\(aqm not +sure what a better solution would be... +.SH COPYRIGHT +.sp +Licensed under the WTFPL. See \fI\%http://www.wtfpl.net/txt/copying/\fP for details. +.SH AUTHORS +.INDENT 0.0 +.IP B. 3 +Watson <\fI\%urchlay@slackware.uk\fP>. +.UNINDENT +.SH SEE ALSO +.sp +xxd(1), bvi(1), utf\-8(7), unicode(7) +.\" Generated by docutils manpage writer. +. |