-
Notifications
You must be signed in to change notification settings - Fork 19
Configuring the locale for language and encoding‐aware operations
PawnPlus can take use of the system's cultural settings (the "locale") through mechanisms exposed by std::locale
in C++, used for the purposes of formatting and character conversion and comparison.
When loaded, the plugin sets the global locale (via std::locale::global
) to the invariant one (std::locale::classic
, commonly identified as "C"
or "POSIX"
) (so any previously-set locale through the server or environment variables will be ignored) and it supports modifying the global locale through pp_locale
. It should be noted that other C++ modules may share the same global locale, so this settings affect them as well. The C locale (used by modules in C and set by std::setlocale
) is not affected.
The locale can be applied or changed for any number of distinct categories, represented by locale_category
:
enum locale_category (<<= 1)
{
locale_none = 0,
locale_collate = 1,
locale_ctype,
locale_monetary,
locale_numeric,
locale_time,
locale_messages,
locale_all = -1,
}
These categories affect the following areas of the plugin:
-
locale_collate
controls character equivalence and comparisons. It is used only for regular expressions whenregex_collate
is set. In such a case, character ranges (e.g.[a-z]
) will use the order of characters imposed by the locale. -
locale_ctype
specifies character categories (letter, digit, etc.) as well as lowercase and uppercase conversions. It is used forstr_to_lower
/str_set_to_lower
,str_to_upper
/str_set_to_upper
, and regular expressions, either when character classes like\s
or[[:alpha:]]
are used, or withregex_icase
. -
locale_numeric
defines how numbers are formatted, for example which character is used for the decimal point (e.g..
or,
). It is used bystr_format
and similar, includingtag_op_string
andtag_op_format
.
To make the script encoding-aware, only locale_ctype
is necessary. Since PawnPlus uses cell strings (with 32-bit instead of 8-bit characters), only the ANSI character range (0 to 255) has any special treatment, with values outside this range remaining unassigned (thus there is also no special support for Unicode characters if stored in that range).
The pp_locale
function needs a pre-existing locale name. It may be empty (""
) to use the system's native locale, "C"
to use the invariant locale, or any system-provided locale name, which can be found on POSIX systems by running locale -a
.
On Windows, the locale name uses the formats <language>
, <language>-<REGION>
, <language>-<Script>
, or <language>-<Script>-<REGION>
. A locale corresponds to a particular code page used when interpreting ANSI text. An overview of common locales and their code pages can be found here.
As an example, the locale name cs-CZ
corresponds to the Czech language and regional settings, and uses the encoding Windows-1250.
On Linux, the set of supported locales can be extended by running the localedef
command by using pre-existing language and character mapping definitions.
For example, localedef -i cs_CZ -f CP1250 cs_CZ.CP1250
creates a new locale named cs_CZ.CP1250
using the Windows-1250 encoding.
When used with a non-existing locale name, pp_locale
raises an error. It can be used together with pawn_try_call_native
to attempt to set the locale and warn if none is found:
new result;
if(
(pawn_try_call_native(pawn_nameof(pp_locale), result, "sd", "cs-CZ", locale_ctype) != amx_err_none) &
(pawn_try_call_native(pawn_nameof(pp_locale), result, "sd", "cs_CZ.cp1250", locale_ctype) != amx_err_none)
)
{
print("Warning: No character locale data can be set!");
}