.\" Man page generated from reStructuredText. . . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .TH "UXD" 1 "2024-12-23" "0.3.0" "Urchlay's Utilities" .SH NAME uxd \- UTF-8 hex dumper .SH SYNOPSIS .sp uxd [\fB\-n\fP] [\fB\-c\fP \fIcolors\fP] [\fB\-d\fP \fIdata\fP] [\fB\-l\fP \fIlength\fP] [\fB\-o\fP \fIoffset\fP] [[\fB\-s\fP | \fB\-S\fP] \fIseekpos\fP] [\-[\fB1bchijmnprtTuvw\fP] ...] [\fIfile\fP | \fI\-\fP] .SH DESCRIPTION .sp \fBuxd\fP is a hex dump utility that\(aqs aware of UTF\-8 multibyte sequence semantics, and uses colorized output to indicate which byte sequences go with which human\-readable characters. .sp Input is read from \fIfile\fP, or standard input if \fIfile\fP is missing or given as \fB\-\fP\&. The input is treated as UTF\-8 encoded Unicode. Since ASCII is a subset, \fBuxd\fP works fine on plain ASCII files too. Other encodings such as UTF\-16, ISO\-8859\-*, Shift\-JIS, etc, can be used, but \fBuxd\fP won\(aqt handle these any better than a regular hex\-dump utility such as \fBxxd\fP\&. .sp Output is written to standard output, which is normally a terminal. It\(aqs assumed that the terminal supports ANSI\-style color and UTF\-8. See \fBTERMINAL SUPPORT\fP below. If you want to pipe the output to a pager, try \fBless \-R\fP\&. .SH OPTIONS .sp These options can be used on the command line, and/or in the \fBUXD_OPTS\fP environment variable. The command line takes precedence over the environment. .sp Options can be bundled: \fB\-ubc1234\fP is the same as \fB\-u\fP \fB\-b\fP \fB\-c 1234\fP\&. The one exception is the \fB\-n\fP option, which should appear by itself. .sp The options that accept numbers (\fB\-l\fP, \fB\-o\fP, \fB\-s\fP, and \fB\-S\fP) allow decimal, or hex (with \fI0x\fP prefix). You can use the suffixes \fIk\fP, \fIm\fP, \fIg\fP, and \fIt\fP for power\-of\-2 based kilobytes, megabytes, gigabytes, or terabytes (e.g. \fI1k\fP is 1024 bytes), as well as \fIK\fP, \fIM\fP, \fIG\fP, and \fIT\fP for power\-of\-10 based (e.g. \fI1K\fP is 1000 bytes). Also, a decimal point can be used: \fB1.5K\fP is 1500 bytes, \fB1.5k\fP is 1536 bytes. .\" the comments are turned into the --help message by mkusage.pl. . .INDENT 0.0 .TP .B \-\- No more options; whatever comes after this is a filename, even if it begins with \fB\-\fP\&. .UNINDENT .\" no more options. . .INDENT 0.0 .TP .B \-1 Don\(aqt alternate colors. .UNINDENT .\" don't alternate colors. . .INDENT 0.0 .TP .B \-a Don\(aqt dump lines that consist entirely of ASCII characters (codepoints U+00 to U+7f). .TP .B \-b Bold output. This may be more or less readable, depending on your terminal and its color settings. Ignored if \fB\-m\fP given. .UNINDENT .\" bold color output. . .INDENT 0.0 .TP .BI \-c \ nnnnn Set the colors to use. Must be 1 to 5 digits, from 0 to 7. These are standard ANSI colors: .INDENT 7.0 .TP .B 0 black .TP .B 1 red .TP .B 2 green .TP .B 3 yellow .TP .B 4 blue .TP .B 5 purple .TP .B 6 cyan .TP .B 7 white .UNINDENT .sp The first 2 digits are the alternating colors for normal characters, the 3rd and 4th (optional) are the alternating colors for non\-printable and space characters, and the 5th (optional) is for invalid UTF\-8 sequences. .sp Default colors are \fB23561\fP\&. If fewer than 5 colors are supplied, the remaining colors keep their default values. .sp Note that the default color set doesn\(aqt include white or black: usually one of these is the terminal\(aqs background color. Also, it avoids blue, because blue text is hard to read on many terminals. .sp This option also disables a prior \fB\-m\fP option. .UNINDENT .\" colors (1 to 5 digits, 0 to 7). . .INDENT 0.0 .TP .BI \-d \ data Dump this data, instead of reading from a file or stdin. If \fIdata\fP contains spaces or shell metacharacters, make sure you remember to quote it. Only one \fB\-d\fP option can be given. .UNINDENT .\" dump this data instead of a file. . .INDENT 0.0 .TP .B \-h\fP,\fB \-\-help Print built\-in usage message and exit. .UNINDENT .\" print this help message. . .INDENT 0.0 .TP .B \-i After dumping, print information about the input: number of bytes, characters, ASCII (one\-byte) characters, multi\-byte characters, and bad sequences. .UNINDENT .\" print number of bytes/chars/ascii/multibyte/bad sequences. . .INDENT 0.0 .TP .B \-j Java mode (aka MUTF\-8). Identical to UTF\-8 except the overlong \fB0xc0 0x80\fP encoding for codepoint U+0000 (aka NUL), is highlighted in purple and not counted as an error. This may be useful for looking at serialized data created by Java programs. .UNINDENT .\" java (MUTF-8) mode: allow 0xc0 0x80 for U+0000. . .INDENT 0.0 .TP .BI \-l \ length Stop dumping after \fIlength\fP bytes (not characters). If the limit is reached in the middle of a multibyte character, the entire character will be dumped. Negative \fIlength\fP doesn\(aqt make sense, and is an error. .UNINDENT .\" stop dumping after bytes (not characters). . .INDENT 0.0 .TP .B \-m Monochrome mode. Uses underline, bold, reverse video instead of color. Use this if you have trouble distinguishing the colors, or if they look too much like angry fruit salad. Disables prior \fB\-b\fP, \fB\-c\fP options. .UNINDENT .\" monochrome mode. . .INDENT 0.0 .TP .B \-n Ignore \fBUXD_OPTS\fP environment variable. This option should not be bundled with other options (e.g. use \fB\-n \-u\fP, not \fB\-nu\fP). .UNINDENT .\" ignore UXD_OPTS environment variable. . .INDENT 0.0 .TP .BI \-o \ offset Add this amount to the hex offsets (left column). May be negative, if you can think of a reason to want it to be. .UNINDENT .\" added to hex offsets (decimal, 0x hex, 0 octal). . .INDENT 0.0 .TP .B \-p Permissive mode. Turns off error highlighting for overlongs, codepoints above \fBU+10FFFF\fP, and surrogates. Only malformed sequences will be highlighed in red. .TP .B \-r Don\(aqt highlight multi\-byte sequences in reverse video. .UNINDENT .\" don't highlight multi-byte chars in reverse video. . .INDENT 0.0 .TP .BI \-s \ pos Seek in input before starting to dump. \fIpos\fP is bytes, not characters. Positive \fIpos\fP means seek from the start of the input. Negative \fIpos\fP only works on files (not standard input); it means seek backward from EOF. .UNINDENT .\" seek in input before dumping (-pos = seek back from EOF). . .INDENT 0.0 .TP .BI \-S \ pos Same as \fB\-s\fP, but the displayed offsets start at 0 rather than the position after seeking. \fB\-S 100\fP is the same as \fB\-s 100 \-o \-100\fP\&. Works with negative \fIpos\fP, too. .UNINDENT .\" like -s, but also sets -o so addresses start at 0. . .INDENT 0.0 .TP .B \-t Put terminal in UTF\-8 mode, if possible. Prints \fBESC % G\fP sequence, which may or may not be supported by your terminal (works for \fBxterm\fP(1); does not work for \fBurxvt\fP(1)). To avoid surprises, this option also takes the terminal out of UTF\-8 mode before \fBuxd\fP exits, on the theory that you won\(aqt need this option if the terminal is normally running in UTF\-8 mode. If this assumption turns out to be wrong, you can use the \fB\-T\fP option instead. .UNINDENT .\" put terminal in utf-8 mode. . .INDENT 0.0 .TP .B \-T Same as \fB\-T\fP, but leaves the terminal in UTF\-8 mode on exit. Can be used to recover from a previous misuse of \fB\-t\fP, like so: .INDENT 7.0 .INDENT 3.5 .sp .nf .ft C uxd \-T \-dd .ft P .fi .UNINDENT .UNINDENT .UNINDENT .\" put terminal in utf-8 mode, and leave it that way on exit. . .INDENT 0.0 .TP .B \-u Use uppercase hex digits \fIA\-F\fP\&. Default is lowercase. .UNINDENT .\" uppercase hex digits. . .INDENT 0.0 .TP .B \-v\fP,\fB \-\-version Print version number and exit. .UNINDENT .\" print version of uxd. . .INDENT 0.0 .TP .B \-w WTF\-8 mode. Surrogates \fBU+D800\fP to \fBU+D8FF\fP will be highlighted in purple and not counted as errors. .UNINDENT .\" WTF-8 mode (allow surrogates). . .SH OUTPUT FORMAT .sp The output is designed to fit in an 80\-column terminal. If you want HTML output, you might have a look at \fBaha\fP(1). .sp Each line of output consists of eighteen columns: the offset from the start of the file (in hex; minimum 4 digits), 16 bytes of hex data (or empty cells, if the last line of the dump is for fewer than 16 bytes), and the human\-readable form of the same data. .sp The hex bytes and human\-readable data are colorized to make it obvious which bytes make up each character. Since UTF\-8 is a variable\-width encoding, this means that one character may be composed of up to 4 bytes. .sp The hex bytes that make up one character are displayed in the same color, which alternates between yellow and green for successive characters. In addition, they have dashes instead of spaces between them. An example would be \fBc3\-b1\fP (for an ñ character). .sp The 16\-byte hex display always has an extra "spacer" column in the center. Normally this is a space, but if a multibyte character spans it, it will be a dash (so there\(aqll be two dashes: \fBc3\-\-b1\fP). .sp Since the output lines are always 16 hex bytes, multibyte characters can span two lines. When this happens, the character itself will be printed on the first line, along with the first byte(s) on hex. The last hex byte will be followed by a dash, and the next line of hex dump will have the remaining bytes (in the same color as the first bytes and character). This sounds complicated, but it\(aqs easy to understand once you see it a few times. .SH EXAMPLE .sp It\(aqs hard to give a proper example, since man pages don\(aqt support color. You\(aqll have to use your imagination. Also, this section of the man page requires your man command to support UTF\-8 embedded in the man page. If the examples looks mangled, try viewing the source (uxd.rst) in a text editor. .sp Example copied from the Japanese \fBls\fP(1) man page: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C $ echo デフォル | ./uxd 0000: e3\-83\-87 e3\-83\-95 e3\-82\-\-a9 e3\-83\-ab 0a デフォル↵ GGGGGGGG YYYYYYYY GGGGGGGGG YYYYYYYY PP G Y G Y P .ft P .fi .UNINDENT .UNINDENT .sp The colors are indicated by G/Y/P, for green, yellow, and purple. The character above each letter is displayed in that color. .sp From the colorization, and from the dashes between the bytes, it\(aqs obvious that "e3 83 87" is the hex representation of the first character, and that the 2nd is represented by "e3 83 95. .sp The newline is displayed in purple because it\(aqs not a regular printable character. Its human\-readable representation is ↵. Note that if a regular ↵ character appears in the input, it\(aqll be rendered in either green or yellow (so you can tell it\(aqs not just another newline). .SH COLORS .sp The colors in this description are the default ones. They can be changed with the \fB\-c\fP option (see above). .INDENT 0.0 .TP .B \fBgreen\fP, \fByellow\fP Printable characters (except the space, U+0020) alternate between green and yellow. .TP .B \fBpurple\fP, \fBcyan\fP Spaces and unprintable characters ("control" characters, newlines, tabs, etc) alternate between purple and cyan. These are printed as "visible" characters, e.g. ␣ for the space, ↵ for a newline. Hopefully this is an improvement over the usual practice of printing these as periods, like standard hex dumpers do. The Unicode BOM (byte order marker, U+FEFF) is printed as a purple letter B. .TP .B \fBred\fP Invalid UTF\-8 sequences. These are rendered as � (U+0FFD) with a red background, to make them stand out. Invalid sequences are: .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 Prefix bytes (>= 0x80) which are not followed by the correct number of continuation bytes (with their high 2 bits set to \fB10\fP). .IP \(bu 2 Continuation bytes that aren\(aqt preceded by a valid prefix byte. .IP \(bu 2 Truncated UTF\-8 sequence at EOF. .UNINDENT .UNINDENT .UNINDENT .sp Also, there are sequences that are valid UTF\-8 encodings, but not valid Unicode. These are normally rendered with a red background. .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 UTF\-16 surrogates (codepoints U+D800 to U+DFFF) [\fB*\fP]. Rendered as \fBS\fP\&. .IP \(bu 2 Codepoints above U+10FFFF, which are disallowed by the Unicode standard [\fB*\fP]. Rendered as \fB>\fP\&. .IP \(bu 2 Overlong encodings (e.g. codepoints U+0000 to U+007F encoded as 2 or more bytes) [\fB*\fP]. Rendered as \fBO\fP\&. .UNINDENT .UNINDENT .UNINDENT .sp Each error\-highlighted sequence will increment the "Bad sequences" count, if the \fB\-i\fP option is used. .sp For items marked with [\fB*\fP], the \fB\-j\fP, \fB\-p\fP, and/or \fB\-w\fP options can disable error highlighting for this type of error. They will be displayed in purple or cyan rather than red. .UNINDENT .SH TERMINAL SUPPORT .sp \fBuxd\fP should work with any modern terminal that supports color, ANSI\-style escape sequences, Unicode, and UTF\-8 rendering. .sp The author\(aqs testing is done primarily with \fBurxvt\fP(1). Other terminals aren\(aqt tested as often. Some terminals may need UTF\-8 enabled, if it\(aqs not on by default (either in the terminal\(aqs settings or using the \fB\-t\fP option to \fBuxd\fP). .sp Known to work: urxvt, xterm, st, xfce4\-terminal, gnome\-terminal, kitty, konsole, the Linux console (but see \fBFONTS\fP, below). .sp Known \fBnot\fP to work: rxvt (doesn\(aqt support Unicode at all), and its derivatives such as aterm. .sp \fBuxd\fP also builds and runs correctly on a Mac running a recent version of OSX with Terminal.app. .SH FONTS .sp For the human\-readable column to display correctly, you\(aqll need a font with lots of glyphs. Try \fIDeja Vu Sans Mono\fP, \fISymbola\fP, or \fIQuivira\fP (although it\(aqs not really a terminal font). If you use urxvt, it searches for glyphs in multiple fonts, so you can use all of the above at once. .sp Any glyph your font lacks, you\(aqll see as a dotted box, or perhaps a solid block. This isn\(aqt something \fBuxd\fP can do anything about; you\(aqll have to use a different font, or (if you use urxvt) add another font to your URxvt*font resource. .sp The Linux console is capable of rendering UTF\-8, but it\(aqs incapable of displaying more than 512 glyphs. Most console fonts only define 256, since using more than 256 means the console won\(aqt be able to do bold. Expect to see lots of solid or dotted boxes. This isn\(aqt specifically a problem with \fBuxd\fP\&. .SH FILES .sp \fBuxd\fP doesn\(aqt read any files other than the input file, and doesn\(aqt write to any files other than standard output. There\(aqs no config file. .SH ENVIRONMENT .INDENT 0.0 .TP .B \fBUXD_OPTS\fP If this is set, its value is treated as a set of options, which get applied before any command\-line options (unless the command\-line options inclue \fB\-n\fP). .TP .B \fBNO_COLOR\fP If this is set (to any value), \fBuxd\fP runs in monochrome mode, just as though the \fB\-m\fP option were given. This variable is also respected by \fBxxd\fP\&. .UNINDENT .sp It\(aqs \fInot\fP necessary to have a UTF\-8 locale set in e.g. \fBLANG\fP or \fBLC_ALL\fP\&. Also, the \fBTERM\fP variable is not used. .SH EXIT STATUS .sp Zero for success, non\-zero for failure. .sp Failure status will only be returned if \fBuxd\fP failed to open the input file. Invalid input (non\-UTF\-8) doesn\(aqt count as an error; it\(aqll just have lots of red in the output. .SH LIMITATIONS .sp There are not bugs, because they\(aqre part of the design. .sp Only UTF\-8 and a couple of variants (WTF\-8, MUTF\-8) are supported. There is no support for UTF\-16, UTF\-32, UTF\-EBCDIC, or any other non\-UTF\-8 encoding. .sp There\(aqs no support for any output number base except hex. .sp The input is read one byte at a time, so a search or regex match option would be difficult or impossible to implement. .sp Seeking backwards from the end of the file is impossible when reading from standard input. The only way to fake this would be to read the whole file into memory at startup, which \fBuxd\fP doesn\(aqt do. .SH BUGS .sp Combining characters are not handled well. Or at all, really: the 2 characters being combined will have an ANSI color code in between. urxvt at least ignores the color code, so the composite character displays in the color of the first (non\-combining) character. I\(aqm not sure what a better solution would be... .SH COPYRIGHT .sp Licensed under the WTFPL. See \fI\%http://www.wtfpl.net/txt/copying/\fP for details. .SH AUTHORS .INDENT 0.0 .IP B. 3 Watson <\fI\%urchlay@slackware.uk\fP>. .UNINDENT .SH SEE ALSO .sp xxd(1), bvi(1), utf\-8(7), unicode(7), console_codes(4) .\" Generated by docutils manpage writer. .