INTERNATIONALIZATION AND LOCALIZATION OF SHELL SCRIPTS

3 - Internationalization process
--------------------------------

This chapter is intended for developers and maintainers.

The internationalization process comprises following tasks:
1. Prepare scripts for internationalization
2. Mark messages to be localized
3. Use 'xgettext' to produce a template catalog of messages

(1) Prepare scripts for internationalization

This task is needed for shell scripts which do not yet fulfill
requirements for internationalization.

*** Gettext's requirements for shell scripts. ***

The list of requirements below is not complete: it includes only the main
ones that I recommend the developer or maintainer to check, based on my
experience.

Gettext replaces at run time text strings output of:
(1) an "echo" command or
(2) a program (like 'dialog', for instance)
with translated text strings (found in a messages catalog for the language
set by $LANG or $LANGUAGE) if following conditions are fulfilled:
- A MO file is available in the path computed from the TEXTDOMAIN
  environment variable as dir_name/locale/LC_MESSAGES/text_domain.mo.
  For instance, if TEXTDOMAIN=software and $LANG=de_DE.utf8, gettext will
  look for: dir_name/de_DE.utf8/LC_MESSAGES/software.mo
  dir_name can be set through the value of the TEXTDOMAINDIR environment
  variable, otherwise a default value is used.
  In Slackware Linux for instance, the default value is /usr/share/locale.
- TEXTDOMAIN variable is exported before any *gettext command occurs. 
- gettext.sh, which provides the eval_gettext and eval_ngettext functions,
  is sourced before any occurrence of one of these functions.
- A msgid string in the MO file matches exactly the argument of gettext
  (or eval_gettext if the text string includes a parameter expansion).
- The corresponding msgstr string does not include a backslash followed by
  a white space.
- The msgstr string begins and ends with a newline or not, as the msgid
  does.
- If the text string includes a parameter expansion, eval_gettext is used
  instead of gettext.
- "The variable names must consist solely of alphanumeric or underscore
  ASCII characters, not start with a digit and be nonempty; otherwise such
  a variable reference is ignored." (gettext manual)
- Parameter expansions are escaped with a single backslash like this:
  \$parameter or \${parameter}, unless the eval_gettext command be
  inside a  command substitution like this:
  `eval_gettext "..."`" or   "`$(eval_gettext "...")".
  In the latter case, three backslashes is used like this:
  \\\$parameter or \\\${parameter}.
- Only the forms $parameter and ${parameter} of parameter expansion are
  used inside an eval_gettext's argument (all other ones are forbidden).
- Positional parameters, special parameters and command substitutions are
  *not* used inside a gettext's or eval_gettext's argument.

As a practical consequence of the two last rules, it is advisable that all
positional parameters, special parameters, command substitutions and not
allowed forms of parameter substitutions be assigned upstream to named
variables, then expanded in the text string argument of eval_gettext or
eval_negettext.

Tip: if a text string has been included as a msgid in a catalog of
messages and is assigned to a named variable in a script, then the
commands: "gettext $parameter" and "gettext ${parameter}"
will output the translated string at run time, even though 'xgettext'
would discard that command when parsing the script, because 'gettext' is
used instead of 'eval_gettext'. This can be handy. In this case the
parameter expansion should not be escaped.


(2) Mark messages to be localized

I recommend to mark messages:
- arguments of a not redirected 'echo' command
- arguments of redirected 'echo' commands whenever a further processing
  displays it on user's screen
- arguments of other commands which displays the message, for instance
  the 'dialog' program

On the contrary I recommend not to mark
- comments intended for readers of the script,
- text string whose value will be processed later, for instance as
  arguments of a 'case' compound command, or <tag> arguments of a
  'dialog --menu' command.

Sometimes the shell script writes other shell scripts. Then the developer
or maintainer have to decide on a case by case basis what to mark
depending on the intended scope of internationalization.

(3) Use 'xgettext' to produce a template catalog of messages

The choice to produce only one POT file for the software as a whole or to
make one POT files per set of scripts have to be made, considering for
instance which choice will minimize maintenance work, how localization's
work can be organized, relative frequency of updates for the different
sets of scripts which comprise the software, and the relevance of
distinguishing groups of features like setup vs configuration vs
package management. I'm inclined to produce only one POT file, but the
choice is yours.

If the software comprises of numerous scripts located in different places
or included in several packages, it can be handy to collect a copy of all
scripts in a single directory, and/or to register in a text file a list of
all of them with their paths.

The POT file will be generated using the 'xgettext' command (see the
manual or 'xgettext --help' for details).

I suggest to include following options in the command:
-L Shell (of course!)
--strict (to facilitate checks and management of the messages catalogs)
-c       (to include comments useful for the translators in the POT file)
-n       (to identify the source file and the line number of each message.
         This is the default.)

Once the POT file is generated you could check that it includes entries
for all *gettext invocation in shell script(s).


Didier Spaier