add -j/-p/-w options.

author: B. Watson <urchlay@slackware.uk> 2024-12-18 07:05:01 -0500
committer: B. Watson <urchlay@slackware.uk> 2024-12-18 07:05:01 -0500
commit: d0b8532b703ef515b89eb8f34c0402262f3d3f7e (patch)
tree: 58ad6115dbf5ce685823aaf082558418be32c995 /uxd.rst
parent: c6ed5c95a56e55a2bb33c9dca819ddf377f05575 (diff)
download: uxd-d0b8532b703ef515b89eb8f34c0402262f3d3f7e.tar.gz
1 files changed, 37 insertions, 3 deletions
diff --git a/uxd.rst b/uxd.rst
index 459de77..2220174 100644
--- a/uxd.rst
+++ b/uxd.rst
@@ -98,6 +98,15 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
 
 .. print number of bytes/chars/ascii/multibyte/bad sequences.
 
+-j
+  Java mode (aka MUTF-8). Identical to UTF-8 except it allows the
+  overlong **0xc0 0x80** encoding for codepoint U+0000 (aka NUL),
+  which normally would be considered an error.
+  This may be useful for looking at serialized data created by Java
+  programs.
+
+.. java (MUTF-8) mode: allow 0xc0 0x80 for U+0000.
+
 -l length
   Stop dumping after *length* bytes (not characters). If the limit is
   reached in the middle of a multibyte character, the entire character
@@ -126,6 +135,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
 
 .. added to hex offsets (decimal, 0x hex, 0 octal).
 
+-p
+  Permissive mode. Turns off error highlighting for overlongs, codepoints
+  above **U+10FFFF**, and surrogates. Only malformed sequences will be
+  highlighed in red.
+
 -r
   Highlight multi-byte sequences in reverse video, in the hex
   output. Ignored if **-m** given.
@@ -171,6 +185,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes).
 
 .. print version of uxd.
 
+-w
+  WTF-8 mode. Surrogates **U+D800** to **U+D8FF** will not be considered errors.
+
+.. WTF-8 mode (allow surrogates).
+
 OUTPUT FORMAT
 =============
 
@@ -340,12 +359,27 @@ Failure status will only be returned if **uxd** failed to open the
 input file. Invalid input (non-UTF-8) doesn't count as an error;
 it'll just have lots of red in the output.
 
+LIMITATIONS
+===========
+
+There are not bugs, because they're part of the design.
+
+Only UTF-8 and a couple of variants (WTF-8, MUTF-8) are supported.
+There is no support for UTF-16, UTF-32, UTF-EBCDIC, or any other
+non-UTF-8 encoding.
+
+There's no support for any number base except hex.
+
+The input is read one byte at a time, so a search or regex match
+option would be difficult or impossible to implement.
+
+Seeking backwards from the end of the file is impossible when reading
+from standard input. The only way to fake this would be to read the
+whole file into memory at startup, which **uxd** doesn't do.
+
 BUGS
 ====
 
-There should be options and/or a config file to change the colors,
-rather than baking them into the binary.
-
 Combining characters are not handled well. Or at all, really: the 2
 characters being combined will have an ANSI color code in between.
 urxvt at least ignores the color code, so the composite character
author	B. Watson <urchlay@slackware.uk>	2024-12-18 07:05:01 -0500
committer	B. Watson <urchlay@slackware.uk>	2024-12-18 07:05:01 -0500
commit	d0b8532b703ef515b89eb8f34c0402262f3d3f7e (patch)
tree	58ad6115dbf5ce685823aaf082558418be32c995 /uxd.rst
parent	c6ed5c95a56e55a2bb33c9dca819ddf377f05575 (diff)
download	uxd-d0b8532b703ef515b89eb8f34c0402262f3d3f7e.tar.gz