(PHP 5 >= 5.5.0, PHP 7, PHP 8, PECL >= 3.0.0a1)
UConverter::transcode — Convert a string from one character encoding to another
$str
,$toEncoding
,$fromEncoding
,$options
= null
Converts str
from fromEncoding
to toEncoding
.
str
The string to be converted.
toEncoding
The desired encoding of the result.
fromEncoding
The current encoding used to interpret str
.
options
An optional array, which may contain the following keys:
'to_subst'
- the substitution character to use
in place of any character of str
which cannot
be encoded in toEncoding
. If specified, it must
represent a single character in the target encoding.
Example #1 Converting from UTF-8 to UTF-16 and back
<?php
$utf8_string = "\x5A\x6F\xC3\xAB"; // 'Zoë' in UTF-8
$utf16_string = UConverter::transcode($utf8_string, 'UTF-16BE', 'UTF-8');
echo bin2hex($utf16_string), "\n";
$new_utf8_string = UConverter::transcode($utf16_string, 'UTF-8', 'UTF-16BE');
echo bin2hex($new_utf8_string), "\n";
?>
The above example will output:
005a006f00eb 5a6fc3ab
Example #2 Invalid characters in input
If the input string contains a sequence of bytes which is not valid in
the encoding specified by fromEncoding
, they are
replaced by Unicode code point U+FFFD (Replacement Character) before
converting to toEncoding
.
<?php
$invalid_utf8_string = "\xC3"; // incomplete multi-byte UTF-8 sequence
$utf16_string = UConverter::transcode($invalid_utf8_string, 'UTF-16BE', 'UTF-8');
echo bin2hex($utf16_string), "\n";
?>
The above example will output:
fffd
Example #3 Characters which cannot be encoded
If the input string contains characters which cannot be represented
in toEncoding
, they are replaced with a single
character. The default character to use depends on the encoding, and
can be controlled using the 'to_subst'
option.
<?php
$utf8_string = "\xE2\x82\xAC"; // € (Euro Sign) does not exist in ISO 8859-1
// Default replacement in ISO 8859-1 is "\x1A" (Substitute)
$iso8859_1_string = UConverter::transcode($utf8_string, 'ISO-8859-1', 'UTF-8');
echo bin2hex($iso8859_1_string), "\n";
// Specify a replacement of '?' ("\x3F") instead
$iso8859_1_string = UConverter::transcode(
$utf8_string, 'ISO-8859-1', 'UTF-8', ['to_subst' => '?']
);
echo bin2hex($iso8859_1_string), "\n";
// Since ISO 8859-1 cannot map U+FFFD, invalid input is also replaced by to_subst
$invalid_utf8_string = "\xC3"; // incomplete multi-byte UTF-8 sequence
$iso8859_1_string = UConverter::transcode(
$invalid_utf8_string, 'ISO-8859-1', 'UTF-8', ['to_subst' => '?']
);
echo bin2hex($iso8859_1_string), "\n";
?>
The above example will output:
1a 3f 3f