If you are unsure about what $encoding can be set to, here's a full list of all the encodings supported by this extension:
http://www.php.net/manual/en/mbstring.supported-encodings.php
(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)
mb_strlen — Obtiene la longitud de una cadena de caracteres
Obtiene la longitud de un string.
str
El string del que se va a obtener su longitud.
encoding
El parámetro encoding
es la codificación de caracteres. Si es omitido, será usado el valor de la
codificación de caracteres interna.
Devuelve el número de caracteres del
string str
, teniendo éste una codificación de caracteres
dada por encoding
. Un carácter multibyte cuenta
como 1.
Devuelve false
si la codificación dada por encoding
no es válida.
Esta función puede
devolver el valor booleano false
, pero también puede devolver un valor no booleano que se
evalúa como false
. Por favor lea la sección sobre Booleanos para más
información. Use el operador
=== para comprobar el valor devuelto por esta
función.
If you are unsure about what $encoding can be set to, here's a full list of all the encodings supported by this extension:
http://www.php.net/manual/en/mbstring.supported-encodings.php
Speed of mb_strlen varies a lot according to specified character set.
If you need length of string in bytes (strlen cannot be trusted anymore because of mbstring.func_overload) you should use <?php mb_strlen($string, '8bit'); ?>.
It's the fastest way (still a way slower than strlen, though) to determine byte length of string. Other single byte character sets (ASCII, ISO-8859-1, ...) are several times slower than 8bit.
Just did a little benchmarking (1.000.000 times with lorem ipsum text) on the mbs functions
especially mb_strtolower and mb_strtoupper are really slow (up to 100 times slower compared to normal functions). Other functions are alike-ish, but sometimes up to 5 times slower.
just be cautious when using mb_ functions in high frequented scripts.
# test runs: 1000000
# benchmarking strlen vs. mb_strlen
# normal strlen: 3.6795361042023 ms, average: 3.6795361042023E-6 ms
# mb_strlen: 5.5934538841248 ms, average: 5.5934538841248E-6 ms
ok 1 - mb_strlen is slower than strlen
# mb_strlen is 1.52 slower than strlen
#
#
# benchmarking strpos vs. mb_strpos
# normal strpos: 5.5523281097412 ms, average: 5.5523281097412E-6 ms
# mb_strlen: 31.180974960327 ms, average: 3.1180974960327E-5 ms
ok 2 - mb_strlen is slower than strlen
# mb_strpos is 5.62 slower than strpos
#
#
# benchmarking substr vs. mb_substr
# normal substr: 3.4437320232391 ms, average: 3.4437320232391E-6 ms
# mb_strlen: 3.5374181270599 ms, average: 3.5374181270599E-6 ms
ok 3 - mb_strlen is slower than strlen
# mb_substr is 1.03 slower than substr
#
#
# benchmarking strtolower vs. mb_strtolower
# normal strtolower: 4.446839094162 ms, average: 4.446839094162E-6 ms
# mb_strlen: 193.44901108742 ms, average: 0.00019344901108742 ms
ok 4 - mb_strlen is slower than strlen
# mb_strtolower is 43.5 slower than strtolower
#
#
# benchmarking strtoupper vs. mb_strtoupper
# normal strtoupper: 3.0210740566254 ms, average: 3.0210740566254E-6 ms
# mb_strlen: 340.71775603294 ms, average: 0.00034071775603294 ms
ok 5 - mb_strlen is slower than strlen
# mb_strtoupper is 112.78 slower than strtoupper
If you find yourself without the mb string functions and can't easily change it, a quick hack replacement for mb_strlen for utf8 characters is to use a a PCRE regex with utf8 turned on.
$strlen = preg_match_all("/.{1}/us",$utf8string,$dummy);
This is basically an ugly hack which counts all single character matches, and I'd expect it to be painfully slow on large strings.
It may not be clear whether PHP actually supports utf-8, which is the current de facto standard character encoding for Web documents, which supports most human languages. The good news is: it does.
I wrote a test program which successfully reads in a utf-8 file (without BOM) and manipulates the characters using mb_substr, mb_strlen, and mb_strpos (mb_substr should normally be avoided, as it must always start its search at character position 0).
The results with a variety of Unicode test characters in utf-8 encoding, up to four bytes in length, were mostly correct, except that accent marks were always mistakenly treated as separate characters instead of being combined with the previous character; this problem can be worked around by programming, when necessary.