From c41ec69dbc29465ee338125f4f08168e6fcdee86 Mon Sep 17 00:00:00 2001 From: "B. Watson" Date: Thu, 12 Dec 2024 06:26:19 -0500 Subject: trim README, minor man update. --- README | 54 +++++------------------------------------------------- uxd.1 | 7 ++++--- uxd.rst | 3 ++- 3 files changed, 11 insertions(+), 53 deletions(-) diff --git a/README b/README index dead264..755501e 100644 --- a/README +++ b/README @@ -1,51 +1,7 @@ -uxd (Unicode-aware Hex Dumper) +uxd (UTF-8-aware Hex Dumper) -Hex dump utility that uses color to indicate multi-byte UTF-8 -sequences. +uxd is a hex dump utility that's aware of UTF-8 multibyte sequence +semantics, and uses colorized output to indicate which byte +sequences go with which human-readable characters. -As usual for hex dumps, output is columnar. The rightmost column -(which would be ASCII in a regular hex dump) shows one Unicode -character for each UTF-8 sequence in the dump. - -Unicode sequences in the hex column are color-coded to match their -character in the right column. Colors alternate between a set of 4, -to help keep track of which character goes with with byte sequence. - -Sample output: - -00000000: 41 e2 98 af e2 98 ae c2 bf c3 a1 e2 88 9e 42 0a A☯☮¿á∞B↵ -[colors] 1 2 3 4 1 2 3 5 12341235 - -; 0 black (don't use) -5 = 1 red -1 = 2 green -4 = 3 yellow -; 4 blue (don't use) -2 = 5 purple -3 = 6 cyan -; 7 white (don't use) - -Colors 1 to 4 are used for successive Unicode characters. For -instance, color 3 is used for the ☮ character, and also for its hex -representation "e2 98 ae" in the dump. Note that the "A" and "B" are -in the ASCII subset of Unicode, and are treated as one-byte sequences. -If there's a BOM, it'll be in reverse video color 1 (green), and the -printable form of it will likely be "BOM". - -Color 5 is for unprintable characters, with Unicode codepoints below -0x20 (aka "control characters"), plus a few others like 0x7f (delete). -↵ is used for newlines... note that an actual ↵ character will -also be displayed as ↵, but in one of the 4 alternating colors. - -Not shown in the dump: byte sequences that have the high bit(s) set, -but are not valid UTF-8, will be shown in color 5 (red), but in -reverse video. - -Usage: uxd [options] [ ...] - -Options should be based on xxd(1) options, though not all of them will -be supported. If uxd-specific options exist, they should ideally use -letters that xxd doesn't, to avoid confusion. - -Ideas: -support other encodings for Unicode, like UTF-16? +See uxd.rst for full documentation, or (after installation), "man uxd". diff --git a/uxd.1 b/uxd.1 index 87886b3..68c4554 100644 --- a/uxd.1 +++ b/uxd.1 @@ -36,7 +36,8 @@ uxd [\fIfile\fP | \fI\-\fP] .SH DESCRIPTION .sp \fBuxd\fP is a hex dump utility that\(aqs aware of UTF\-8 multibyte sequence -semantics. +semantics, and uses colorized output to indicate which byte +sequences go with which human\-readable characters. .sp Input is read from \fIfile\fP, or standard input if \fIfile\fP is missing or given as \fB\-\fP\&. The input is treated as UTF\-8 encoded Unicode. Since @@ -66,10 +67,10 @@ There are no options yet. It\(aqs hard to give a proper example, since man pages don\(aqt support color. You\(aqll have to use your imagination. Also, this section of the man page requires your man command to support UTF\-8 embedded in -the man page. If the example looks mangled, try viewing the source +the man page. If the examples looks mangled, try viewing the source (uxd.rst) in a text editor. .sp -Japanese characters: +Japanese text example: .INDENT 0.0 .INDENT 3.5 .sp diff --git a/uxd.rst b/uxd.rst index e5a8fff..f6f3bd3 100644 --- a/uxd.rst +++ b/uxd.rst @@ -23,7 +23,8 @@ DESCRIPTION =========== **uxd** is a hex dump utility that's aware of UTF-8 multibyte sequence -semantics. +semantics, and uses colorized output to indicate which byte +sequences go with which human-readable characters. Input is read from *file*, or standard input if *file* is missing or given as **-**. The input is treated as UTF-8 encoded Unicode. Since -- cgit v1.2.3