aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README51
1 files changed, 51 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..dead264
--- /dev/null
+++ b/README
@@ -0,0 +1,51 @@
+uxd (Unicode-aware Hex Dumper)
+
+Hex dump utility that uses color to indicate multi-byte UTF-8
+sequences.
+
+As usual for hex dumps, output is columnar. The rightmost column
+(which would be ASCII in a regular hex dump) shows one Unicode
+character for each UTF-8 sequence in the dump.
+
+Unicode sequences in the hex column are color-coded to match their
+character in the right column. Colors alternate between a set of 4,
+to help keep track of which character goes with with byte sequence.
+
+Sample output:
+
+00000000: 41 e2 98 af e2 98 ae c2 bf c3 a1 e2 88 9e 42 0a A☯☮¿á∞B↵
+[colors] 1 2 3 4 1 2 3 5 12341235
+
+; 0 black (don't use)
+5 = 1 red
+1 = 2 green
+4 = 3 yellow
+; 4 blue (don't use)
+2 = 5 purple
+3 = 6 cyan
+; 7 white (don't use)
+
+Colors 1 to 4 are used for successive Unicode characters. For
+instance, color 3 is used for the ☮ character, and also for its hex
+representation "e2 98 ae" in the dump. Note that the "A" and "B" are
+in the ASCII subset of Unicode, and are treated as one-byte sequences.
+If there's a BOM, it'll be in reverse video color 1 (green), and the
+printable form of it will likely be "BOM".
+
+Color 5 is for unprintable characters, with Unicode codepoints below
+0x20 (aka "control characters"), plus a few others like 0x7f (delete).
+↵ is used for newlines... note that an actual ↵ character will
+also be displayed as ↵, but in one of the 4 alternating colors.
+
+Not shown in the dump: byte sequences that have the high bit(s) set,
+but are not valid UTF-8, will be shown in color 5 (red), but in
+reverse video.
+
+Usage: uxd [options] [<filename> ...]
+
+Options should be based on xxd(1) options, though not all of them will
+be supported. If uxd-specific options exist, they should ideally use
+letters that xxd doesn't, to avoid confusion.
+
+Ideas:
+support other encodings for Unicode, like UTF-16?