1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
uxd (Unicode-aware Hex Dumper)
Hex dump utility that uses color to indicate multi-byte UTF-8
sequences.
As usual for hex dumps, output is columnar. The rightmost column
(which would be ASCII in a regular hex dump) shows one Unicode
character for each UTF-8 sequence in the dump.
Unicode sequences in the hex column are color-coded to match their
character in the right column. Colors alternate between a set of 4,
to help keep track of which character goes with with byte sequence.
Sample output:
00000000: 41 e2 98 af e2 98 ae c2 bf c3 a1 e2 88 9e 42 0a A☯☮¿á∞B↵
[colors] 1 2 3 4 1 2 3 5 12341235
; 0 black (don't use)
5 = 1 red
1 = 2 green
4 = 3 yellow
; 4 blue (don't use)
2 = 5 purple
3 = 6 cyan
; 7 white (don't use)
Colors 1 to 4 are used for successive Unicode characters. For
instance, color 3 is used for the ☮ character, and also for its hex
representation "e2 98 ae" in the dump. Note that the "A" and "B" are
in the ASCII subset of Unicode, and are treated as one-byte sequences.
If there's a BOM, it'll be in reverse video color 1 (green), and the
printable form of it will likely be "BOM".
Color 5 is for unprintable characters, with Unicode codepoints below
0x20 (aka "control characters"), plus a few others like 0x7f (delete).
↵ is used for newlines... note that an actual ↵ character will
also be displayed as ↵, but in one of the 4 alternating colors.
Not shown in the dump: byte sequences that have the high bit(s) set,
but are not valid UTF-8, will be shown in color 5 (red), but in
reverse video.
Usage: uxd [options] [<filename> ...]
Options should be based on xxd(1) options, though not all of them will
be supported. If uxd-specific options exist, they should ideally use
letters that xxd doesn't, to avoid confusion.
Ideas:
support other encodings for Unicode, like UTF-16?
|