diff options
author | B. Watson <urchlay@slackware.uk> | 2024-12-18 07:05:01 -0500 |
---|---|---|
committer | B. Watson <urchlay@slackware.uk> | 2024-12-18 07:05:01 -0500 |
commit | d0b8532b703ef515b89eb8f34c0402262f3d3f7e (patch) | |
tree | 58ad6115dbf5ce685823aaf082558418be32c995 /uxd.rst | |
parent | c6ed5c95a56e55a2bb33c9dca819ddf377f05575 (diff) | |
download | uxd-d0b8532b703ef515b89eb8f34c0402262f3d3f7e.tar.gz |
add -j/-p/-w options.
Diffstat (limited to 'uxd.rst')
-rw-r--r-- | uxd.rst | 40 |
1 files changed, 37 insertions, 3 deletions
@@ -98,6 +98,15 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes). .. print number of bytes/chars/ascii/multibyte/bad sequences. +-j + Java mode (aka MUTF-8). Identical to UTF-8 except it allows the + overlong **0xc0 0x80** encoding for codepoint U+0000 (aka NUL), + which normally would be considered an error. + This may be useful for looking at serialized data created by Java + programs. + +.. java (MUTF-8) mode: allow 0xc0 0x80 for U+0000. + -l length Stop dumping after *length* bytes (not characters). If the limit is reached in the middle of a multibyte character, the entire character @@ -126,6 +135,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes). .. added to hex offsets (decimal, 0x hex, 0 octal). +-p + Permissive mode. Turns off error highlighting for overlongs, codepoints + above **U+10FFFF**, and surrogates. Only malformed sequences will be + highlighed in red. + -r Highlight multi-byte sequences in reverse video, in the hex output. Ignored if **-m** given. @@ -171,6 +185,11 @@ as *K*, *M*, and *G* for power-of-10 based (e.g. *1K* is 1000 bytes). .. print version of uxd. +-w + WTF-8 mode. Surrogates **U+D800** to **U+D8FF** will not be considered errors. + +.. WTF-8 mode (allow surrogates). + OUTPUT FORMAT ============= @@ -340,12 +359,27 @@ Failure status will only be returned if **uxd** failed to open the input file. Invalid input (non-UTF-8) doesn't count as an error; it'll just have lots of red in the output. +LIMITATIONS +=========== + +There are not bugs, because they're part of the design. + +Only UTF-8 and a couple of variants (WTF-8, MUTF-8) are supported. +There is no support for UTF-16, UTF-32, UTF-EBCDIC, or any other +non-UTF-8 encoding. + +There's no support for any number base except hex. + +The input is read one byte at a time, so a search or regex match +option would be difficult or impossible to implement. + +Seeking backwards from the end of the file is impossible when reading +from standard input. The only way to fake this would be to read the +whole file into memory at startup, which **uxd** doesn't do. + BUGS ==== -There should be options and/or a config file to change the colors, -rather than baking them into the binary. - Combining characters are not handled well. Or at all, really: the 2 characters being combined will have an ANSI color code in between. urxvt at least ignores the color code, so the composite character |