aboutsummaryrefslogtreecommitdiff
path: root/README
blob: dead264b52f66073585c8320fa0ccf2c7152d1bd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
uxd (Unicode-aware Hex Dumper)

Hex dump utility that uses color to indicate multi-byte UTF-8
sequences.

As usual for hex dumps, output is columnar. The rightmost column
(which would be ASCII in a regular hex dump) shows one Unicode
character for each UTF-8 sequence in the dump.

Unicode sequences in the hex column are color-coded to match their
character in the right column. Colors alternate between a set of 4,
to help keep track of which character goes with with byte sequence.

Sample output:

00000000: 41 e2 98 af e2 98 ae c2 bf c3 a1 e2 88 9e 42 0a  A☯☮¿á∞B↵
[colors]  1  2        3        4     1        2     3  5   12341235

;   0 black (don't use)
5 = 1 red
1 = 2 green
4 = 3 yellow
;   4 blue (don't use)
2 = 5 purple
3 = 6 cyan
;   7 white (don't use)

Colors 1 to 4 are used for successive Unicode characters. For
instance, color 3 is used for the ☮ character, and also for its hex
representation "e2 98 ae" in the dump. Note that the "A" and "B" are
in the ASCII subset of Unicode, and are treated as one-byte sequences.
If there's a BOM, it'll be in reverse video color 1 (green), and the
printable form of it will likely be "BOM".

Color 5 is for unprintable characters, with Unicode codepoints below
0x20 (aka "control characters"), plus a few others like 0x7f (delete).
↵ is used for newlines... note that an actual ↵ character will
also be displayed as ↵, but in one of the 4 alternating colors.

Not shown in the dump: byte sequences that have the high bit(s) set,
but are not valid UTF-8, will be shown in color 5 (red), but in
reverse video.

Usage: uxd [options] [<filename> ...]

Options should be based on xxd(1) options, though not all of them will
be supported. If uxd-specific options exist, they should ideally use
letters that xxd doesn't, to avoid confusion.

Ideas:
support other encodings for Unicode, like UTF-16?