aboutsummaryrefslogtreecommitdiff
path: root/uxd.rst
diff options
context:
space:
mode:
authorB. Watson <urchlay@slackware.uk>2024-12-18 07:05:01 -0500
committerB. Watson <urchlay@slackware.uk>2024-12-18 07:05:01 -0500
commitd0b8532b703ef515b89eb8f34c0402262f3d3f7e (patch)
tree58ad6115dbf5ce685823aaf082558418be32c995 /uxd.rst
parentc6ed5c95a56e55a2bb33c9dca819ddf377f05575 (diff)
downloaduxd-d0b8532b703ef515b89eb8f34c0402262f3d3f7e.tar.gz
add -j/-p/-w options.
Diffstat (limited to 'uxd.rst')
-rw-r--r--uxd.rst40
1 files changed, 37 insertions, 3 deletions
diff --git a/uxd.rst b/uxd.rst
index 459de77..2220174 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -98,6 +98,15 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
.. print number of bytes/chars/ascii/multibyte/bad sequences.
+-j
+ Java mode (aka MUTF-8). Identical to UTF-8 except it allows the
+ overlong **0xc0 0x80** encoding for codepoint U+0000 (aka NUL),
+ which normally would be considered an error.
+ This may be useful for looking at serialized data created by Java
+ programs.
+
+.. java (MUTF-8) mode: allow 0xc0 0x80 for U+0000.
+
-l length
Stop dumping after *length* bytes (not characters). If the limit is
reached in the middle of a multibyte character, the entire character
@@ -126,6 +135,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
.. added to hex offsets (decimal, 0x hex, 0 octal).
+-p
+ Permissive mode. Turns off error highlighting for overlongs, codepoints
+ above **U+10FFFF**, and surrogates. Only malformed sequences will be
+ highlighed in red.
+
-r
Highlight multi-byte sequences in reverse video, in the hex
output. Ignored if **-m** given.
@@ -171,6 +185,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
.. print version of uxd.
+-w
+ WTF-8 mode. Surrogates **U+D800** to **U+D8FF** will not be considered errors.
+
+.. WTF-8 mode (allow surrogates).
+
OUTPUT FORMAT
=============
@@ -340,12 +359,27 @@ Failure status will only be returned if **uxd** failed to open the
input file. Invalid input (non-UTF-8) doesn't count as an error;
it'll just have lots of red in the output.
+LIMITATIONS
+===========
+
+There are not bugs, because they're part of the design.
+
+Only UTF-8 and a couple of variants (WTF-8, MUTF-8) are supported.
+There is no support for UTF-16, UTF-32, UTF-EBCDIC, or any other
+non-UTF-8 encoding.
+
+There's no support for any number base except hex.
+
+The input is read one byte at a time, so a search or regex match
+option would be difficult or impossible to implement.
+
+Seeking backwards from the end of the file is impossible when reading
+from standard input. The only way to fake this would be to read the
+whole file into memory at startup, which **uxd** doesn't do.
+
BUGS
====
-There should be options and/or a config file to change the colors,
-rather than baking them into the binary.
-
Combining characters are not handled well. Or at all, really: the 2
characters being combined will have an ANSI color code in between.
urxvt at least ignores the color code, so the composite character