diff options
author | B. Watson <urchlay@slackware.uk> | 2024-12-12 06:21:05 -0500 |
---|---|---|
committer | B. Watson <urchlay@slackware.uk> | 2024-12-12 06:21:05 -0500 |
commit | 4df7bb4d762ff945fb7a823cb4c153cab7e3c273 (patch) | |
tree | bd58ead44eb3ff2c3e0d2935144bbe663d845b1a /README | |
download | uxd-4df7bb4d762ff945fb7a823cb4c153cab7e3c273.tar.gz |
initial commit
Diffstat (limited to 'README')
-rw-r--r-- | README | 51 |
1 files changed, 51 insertions, 0 deletions
@@ -0,0 +1,51 @@ +uxd (Unicode-aware Hex Dumper) + +Hex dump utility that uses color to indicate multi-byte UTF-8 +sequences. + +As usual for hex dumps, output is columnar. The rightmost column +(which would be ASCII in a regular hex dump) shows one Unicode +character for each UTF-8 sequence in the dump. + +Unicode sequences in the hex column are color-coded to match their +character in the right column. Colors alternate between a set of 4, +to help keep track of which character goes with with byte sequence. + +Sample output: + +00000000: 41 e2 98 af e2 98 ae c2 bf c3 a1 e2 88 9e 42 0a A☯☮¿á∞B↵ +[colors] 1 2 3 4 1 2 3 5 12341235 + +; 0 black (don't use) +5 = 1 red +1 = 2 green +4 = 3 yellow +; 4 blue (don't use) +2 = 5 purple +3 = 6 cyan +; 7 white (don't use) + +Colors 1 to 4 are used for successive Unicode characters. For +instance, color 3 is used for the ☮ character, and also for its hex +representation "e2 98 ae" in the dump. Note that the "A" and "B" are +in the ASCII subset of Unicode, and are treated as one-byte sequences. +If there's a BOM, it'll be in reverse video color 1 (green), and the +printable form of it will likely be "BOM". + +Color 5 is for unprintable characters, with Unicode codepoints below +0x20 (aka "control characters"), plus a few others like 0x7f (delete). +↵ is used for newlines... note that an actual ↵ character will +also be displayed as ↵, but in one of the 4 alternating colors. + +Not shown in the dump: byte sequences that have the high bit(s) set, +but are not valid UTF-8, will be shown in color 5 (red), but in +reverse video. + +Usage: uxd [options] [<filename> ...] + +Options should be based on xxd(1) options, though not all of them will +be supported. If uxd-specific options exist, they should ideally use +letters that xxd doesn't, to avoid confusion. + +Ideas: +support other encodings for Unicode, like UTF-16? |