Skip to content

Strings

IS4 edited this page Oct 26, 2024 · 12 revisions

This plugin introduces dynamically allocated mutable strings (of cells). These strings are manipulated using their addresses in memory (pointers), tagged either String or ConstString (more on the difference later). It is also possible to pass such a string to (almost) any native function without intermediate copying of the characters.

Usage

PawnPlus supports different ways of creating dynamic strings, each made for a different purpose. The first way of creating a dynamic string is using @ (an alias of str_new_static) on a literal string expression:

new String:str = @("Dynamic string");

str_new_static has an additional parameter that receives the length of the string by default, so it doesn't have to be computed. This means that the string can contain null characters, but all of them are valid up to the last one (the terminating null character). Use str_new if the string is smaller than the array that contains it.

This string exists regardless of its source, and its data is not bound to any AMX machine or script. It can be passed to any functions, or returned from functions:

stock String:GetHalf(ConstStringTag:str)
{
    return str_sub(str, 0, str_len(str)/2);
}

Strings (with one exception) are always mutable. This means that functions can modify the characters of the string, so it is useful to distinguish functions that don't modify the string by using ConstString (like you would use a const array). str_sub creates a new string, but the GetHalf function could be also written to modify the string:

stock String:MakeHalf(StringTag:str)
{
    return str_del(str, str_len(str)/2);
}

Instead of returning a substring, str_del deletes the second half of the string and returns the same string instance it was provided with. Therefore, this code is correct:

new String:str1 = @("Hello");
new String:str2 = GetHalf(str1); // creates a new instance
assert(str1 == @("Hello")); // by-value equality
assert(str2 == @("He"));
new String:str3 = MakeHalf(str1); // keeps the same instance
assert(_:str1 == _:str3); // by-reference equality (identity)
assert(str3 == @("He"));

For convenience, there are three operators defined on dynamic strings: + (concatenation, routed to str_cat), == (by-value equality, routed to str_eq), and % (concatenation, more details below). Using other operators on strings is strictly prohibited since it's most likely a mistake.

Because the Pawn compiler does some reordering to the arguments of + if they are not all the same tag, % has to be used if this happens. Integers and floats can be also implicitly converted to strings automatically if they are used in a string position, but this behaviour must be enabled by defining PP_SYNTAX_STRING_OP.

You can copy the contents of the string easily back to a buffer:

new String:str = @("Dynamic string");
new buffer[16];
str_get(str, buffer);
print(buffer);

Native interoperability

Almost any native function can be changed so that instead of taking a string as a character array, it takes a dynamic string instead. Let's start with a simple function like print:

native print(const string[]);

The native function expects an address of a string inside the AMX machine's memory. However, this plugin enables you to pass it an address outside the machine and it will interpret it as a string if possible. The modification is simple:

native print_s(ConstAmxString:string) = print;

The tag must be AmxString or ConstAmxString instead of String because the address itself must be relative to the abstract machine's memory. Internally, str_addr or str_addr_const is called for the conversion, which returns the offset address. The = allows changing the name of a native in the script but still refering to the same function.

new String:str1 = @("Hello ");
new String:str2 = @("world!");
print_s(str1+str2);

PawnPlus already defines print_s for convenience.

Unfortunately, printf cannot be modified in such a way, because it doesn't use the standard AMX API to access its parameters. For variadic functions (with ...), the conversion to AmxString is not done automatically and must be done manually:

native CallLocalFunctionStr(const function[], const format[], {AmxString,Float,_}:...) = CallLocalFunction;

public OnFilterScriptInit()
{
    new String:str1 = @("Hello ");
    new String:str2 = @("world!");
    pp_hook_check_ref_args(true); // required for the result of str_addr to be picked
    CallLocalFunctionStr(#StringReceiver, "s", str_addr(str1+str2));
}

forward StringReceiver(str[]);
public StringReceiver(str[])
{
    print(str);
}

If str_addr hadn't been used, the compiler would issue a warning, but wouldn't attempt to convert the value. This is also the second way to extract the contents of a dynamically allocated string, one which doesn't require to know the size of the buffer.

Using dynamic strings is safe if the function uses the string as its input and doesn't modify the contents (usually coupled with const in the declaration), but it is also possible to use them as buffers for functions that modify the contents.

Converting these functions is not always simple or consistent, because while standard Pawn functions and plugins use amx_GetAddr, SA-MP doesn't use it for output strings, and it computes the pointer from the address directly, without any checks. "Well-behaved" functions can be converted in the standard way, but the size of the string must be taken into account:

native strcat_s_impl(AmxString:dest, const source[], maxlength) = strcat;
stock strcat_s(StringTag:dest, const source[])
{
    return strcat_s_impl(dest, source, str_len(dest) + 1);
}

The correct size is indeed str_len(dest) + 1, because the actual buffer includes the null character. Unfortunately, calling this produces an error, since strcat itself checks the validity of the address it obtains. This cannot be circumvented without relying on the memory layout of the executable, but only the standard library functions do this.

Normal SA-MP functions access the address directly, which means that the address of the actual character data must be passed. This is represented by AmxStringBuffer::

native GetPlayerNameStrImpl(playerid, AmxStringBuffer:name, len) = GetPlayerName;
stock String:GetPlayerNameStr(playerid)
{
    new String:str = str_new_buf(MAX_PLAYER_NAME);
    str_resize(str, GetPlayerNameStrImpl(playerid, str, MAX_PLAYER_NAME));
    return str;
}

GetPlayerNameStrImpl can be called directly, but for convenience, a function that creates the output string automatically should be used. str_new_buf creates a new empty string and sets its size to size - 1. The conversion to AmxStringBuffer: returns the address of the characters, which is guaranteed to be a block of memory of at least size bytes (including the null character). GetPlayerName then writes directly into the buffer, and returns the number of characters written, which is then used to truncate the string (null characters aren't used to determine the size of dynamic strings).

Since AmxStringBuffer: is the actual address of the characters, you can add bytes to it to produce a pointer into the middle of the string. Currently, this relies on the number of bytes, but it may be changed to the number of cells in the future, so it should not be relied upon:

new String:str = @("My name is _______________________");
GetPlayerNameStrImpl(playerid, str_buf_addr(str)+44, MAX_PLAYER_NAME);
print_s(str);

The null string

There is a special string value, STRING_NULL, which is an immutable special string that can be used in all functions but is always empty and not modifiable (unless the modification would result in an empty string). It also has a special behaviour when used as an argument for variadic functions:

public OnFilterScriptInit()
{
    CallLocalFunctionStr(#Func, "s", str_addr(STRING_NULL));
}

forward Func(str[]);
public Func(str[])
{
    printf("%d", str[0]); //1
}

Since these functions generally crash when passed an empty string, when STRING_NULL is passed to them, is is converted to "\1;" instead of an empty string.

String lifetime and garbage collection

PawnPlus employs a mechanism similar to garbage collection for string and other objects. However, because it cannot reliably scan the memory and find if a string is used, it needs hints to know when a string is "owned", in the form of str_acquire and str_release. More information about this mechanism here.

Null characters

Strings in Pawn (and SA-MP) are null-terminated, meaning that the string end is located at the first zero cell. There are two ways to get around this problem – store the length together with the string, or use another character:

new str[] = "A\256;bit longer";
printf("%d %s", strlen(str), str); //12 A

256 (0x100) does not fit into a byte, and so it is truncated into a null character when displayed, but functions like strlen check cells and not bytes. By default, str_new respects the original cells of the string and does not change them:

new String:str = str_new("A\256;bit longer");
printf("%d %d", str_len(str), str_getc(str, 1)); //12 256

In some cases, you might want to represent the string how it was intended (i.e. with a proper null character. In this case, use str_truncate as the second argument to str_new:

new String:str = str_new("A\256;bit longer", str_truncate);
printf("%d %d", str_len(str), str_getc(str, 1)); //12 0

str_truncate will truncate all cells to a single byte. However, there is now a problem in SA-MP functions that use amx_StrLen to compute the length of the string. To fix it, this plugin also hooks the function, but since this affects almost any call to a native function taking a string, you might want to disable the hook:

native strlen_s(AmxString:string) = strlen;

public OnFilterScriptInit()
{
    new String:str = str_new("A\256;bit longer", str_truncate);
    printf("%d", strlen_s(str)); //12
    pp_hook_strlen(false);
    printf("%d", strlen_s(str)); //1
}

Strings as arrays and arrays as strings

Since dynamic strings can hold any number of any cells, they can effectively also store standard arrays, albeit they are not easily accessed:

enum STRUCT
{
    S_FIELD1,
    Float:S_FIELD2,
    S_FIELD3[16]
}

public OnFilterScriptInit()
{
    new data[STRUCT];
    data[S_FIELD1] = -1729;
    data[S_FIELD2] = 1.618034;
    data[S_FIELD3] = "abcdefghijklmno";
    
    new String:str = str_new_arr(data[STRUCT:0], _:STRUCT);
    printf("%d", str_getc(str, 0)); //-1729
    
    new data2[_:STRUCT + 1];
    str_get(str, data2);
    
    print(data2[_:S_FIELD3]); //abcdefghijklmno
}

Variants are more suited for storing standard (tagged) arrays, and lists and maps are better for complex objects.

Packed strings

Strings in Pawn can be stored as packed or as unpacked. Unpacked strings store every character in a single cell, while packed strings store them more effectively, with 4 characters in a single cell. In order for any function to determine if a string is packed or unpacked, packed characters start at the most significant byte in a cell. Therefore, a dynamic string (which is always unpacked) that starts with a cell that is negative or larger than 0xFFFFFF will not be recognized correctly when passed to a native function.

Clone this wiki locally