From 06810df1c053091256f2d32589f359f40269fddc Mon Sep 17 00:00:00 2001 From: "B. Watson" Date: Mon, 3 Feb 2025 04:33:03 -0500 Subject: bsgrep: document locale weirdness. --- bsgrep | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'bsgrep') diff --git a/bsgrep b/bsgrep index d4ef311..ff98f07 100755 --- a/bsgrep +++ b/bsgrep @@ -342,8 +342,9 @@ on standard error: bsgrep: : stripping carriage returns The input file has MS-DOS/Windows CRLF line endings. B's -output will have these removed. Note that Unix-flavored tools that -understand continuation lines will generally fail when fed CRLF files. +output will have these removed. Note that other Unix-flavored tools +that understand continuation lines will generally fail when fed CRLF +files. bsgrep: , line : whitespace after continuation, malformed input? @@ -360,6 +361,14 @@ to continue onto, so this is almost certainly an error. The above warnings don't affect the exit status. +=head1 ENVIRONMENT + +B doesn't define any environment variables of its own, but +it does pay attention to B, B, and B. If any +of these contain the string I, the input and output will be +read/written as Unicode, encoded as UTF-8. If the input turns out not +to be Unicode, it will be assumed ISO-8859-1, and converted to Unicode. + =head1 EXIT STATUS 0 if there were any matches, 1 if there were none, or 2 if there @@ -372,7 +381,7 @@ B's exit status. B doesn't detect binary files like B does. It can and will print them to your terminal instead of "binary file matches". -Not all b options are supported. Options that aren't implemented +Not all B options are supported. Options that aren't implemented but might be someday include B<--color>, B<-a>, B<-A>, B<-B>, B<-C>, B<-o>. I don't intend to implement every single option B has, there are too many of them. @@ -382,6 +391,13 @@ There are no long options other than B<--help> and B<--version>. B does not comply with the POSIX (or any other) standard for B, and does not intend do. +Locale support isn't quite the same as B: in a UTF-8 locale, +if the input isn't plain ASCII or valid UTF-8, it will be treated +as ISO-8859-1, internally converted to Unicode, and output will be +UTF-8. This isn't intended; it's a side-effect of how Perl UTF-8 +filehandles work. In non-UTF-8 locales, things should work as +expected. I hope. + =head1 AUTHOR B was written by B. Watson and released -- cgit v1.2.3