Skip to content

Commit 7c4b5fb

Browse files
authored
add details about php8.1 mb-detect-encoding unordered encodings (php#2426)
* add details about php8.1 being unordered * simplify message about exclusions * add comment to example
1 parent dcf89d1 commit 7c4b5fb

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

reference/mbstring/functions/mb-detect-encoding.xml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,11 @@
1616
</methodsynopsis>
1717
<para>
1818
Detects the most likely character encoding for <type>string</type> <parameter>string</parameter>
19-
from an ordered list of candidates.
19+
from a list of candidates.
20+
</para>
21+
<para>
22+
As of PHP 8.1 this function uses heuristics to detect which of the valid text encodings in the specified
23+
list is most likely to be correct and may not be in order of <parameter>encodings</parameter> provided.
2024
</para>
2125
<para>
2226
Automatic detection of the intended character encoding can never be entirely reliable;
@@ -27,7 +31,7 @@
2731
<para>
2832
This function is most useful with multibyte encodings, where not all sequences of
2933
bytes form a valid string. If the input string contains such a sequence, that
30-
encoding will be rejected, and the next encoding checked.
34+
encoding will be rejected.
3135
</para>
3236

3337
<warning>
@@ -58,7 +62,7 @@
5862
<term><parameter>encodings</parameter></term>
5963
<listitem>
6064
<para>
61-
A list of character encodings to try, in order. The list may be specified as
65+
A list of character encodings to try. The list may be specified as
6266
an array of strings, or a single string separated by commas.
6367
</para>
6468
<para>
@@ -223,8 +227,9 @@ string(10) "ISO-8859-1"
223227
<?php
224228
$str = "\xC4\xA2";
225229
226-
// The string is valid in all three encodings, so the first one listed will be returned
227-
var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1', 'ISO-8859-5']));
230+
// The string is valid in all three encodings, but the first one listed may not always be the one returned
231+
var_dump(mb_detect_encoding($str, ['UTF-8']));
232+
var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1', 'ISO-8859-5'])); // as of php8.1 this returns ISO-8859-1 instead of UTF-8
228233
var_dump(mb_detect_encoding($str, ['ISO-8859-1', 'ISO-8859-5', 'UTF-8']));
229234
var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8', 'ISO-8859-1']));
230235
?>
@@ -235,6 +240,7 @@ var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8', 'ISO-8859-1']));
235240
<![CDATA[
236241
string(5) "UTF-8"
237242
string(10) "ISO-8859-1"
243+
string(10) "ISO-8859-1"
238244
string(10) "ISO-8859-5"
239245
]]>
240246
</screen>

0 commit comments

Comments
 (0)