PHPerKaigi 2025

mb_ereg

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

mb_eregRecherche par expression rationnelle avec support des caractères multioctets

Description

mb_ereg(string $pattern, string $string, array &$matches = null): bool

Recherche par expression rationnelle avec support des caractères multioctets.

Liste de paramètres

pattern

Le masque de recherche.

string

La chaîne sur laquelle porte la recherche.

matches

Si des correspondances sont trouvées pour les sous-chaînes entre parenthèses de pattern et si la fonction est appelée avec le troisième argument matches, les correspondances seront stockées dans les éléments du tableau matches. Si aucune correspondance n'est trouvée, matches a pour valeur un tableau vide.

$matches[1] contiendra la sous-chaîne qui commence à la première parenthèse gauche; $matches[2] contiendra la sous-chaîne commençant à la seconde, et ainsi de suite. $matches[0] contiendra une copie de la chaîne complète assortie.

Valeurs de retour

Retourne si une correspondence de pattern a été trouvé dans string.

Historique

Version Description
8.0.0 Cette fonction retourne désormais true en cas de succès. Auparavant, elle retournait la longueur d'octet de la chaîne trouvé, si une correspondence pour pattern était trouvé dans string et que matches était fournis. Si la paramètre optionnel matches n'était pas fournis ou que la longueur de la chaîne correspondante était 0, cette fonction retournait 1.
7.1.0 mb_ereg() va maintenant affecter matches à un array vide, si rien ne correspond. Auparavant, les matches n'étaient pas modifiées dans ce cas.

Notes

Note:

L'encodage interne ou l'encodage des caractères spécifié par la fonction mb_regex_encoding() sera utilisé comme encodage de caractères pour cette fonction.

Voir aussi

  • mb_regex_encoding() - Définit/Récupère l'encodage des caractères pour les expressions régulières multioctets
  • mb_eregi() - Expression rationnelle insensible à la casse avec le support des caractères multioctets

add a note

User Contributed Notes 12 notes

up
3
Anonymous
3 years ago
One of the differences between preg_match() & mb_ereg()
about "captured parenthesized subpattern".

<?php

preg_match
('/(abc)(.*)/', 'abc', $match);
var_dump($match);

mb_ereg('(abc)(.*)', 'abc', $match);
var_dump($match);

?>

array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
string(0) "" // <-- "string"(0) "" : preg_match()
}

array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
bool(false) // <-- "bool"(false) : mb_ereg()
}
up
5
Anonymous
7 years ago
Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
up
2
pressler at hotmail dot de
12 years ago
Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:

<?PHP

$text
= 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew

mb_regex_encoding('UTF-8');

if(
mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo
"Text has some arabic/hebrew characters.";
}
else
{
echo
"Text doesnt have arabic/hebrew characters.";
}

?>
up
0
Anonymous
2 months ago
mb_ereg() cannot match over 100,000 (100K) characters (not bytes but characters)
whereas preg_match() can over 1,000,000,000 (1G, if it's within "memory_limit").
Try this.

<?php

ini_set
("memory_limit", "512M"); // <-- must be changed if you try 1G.
$length = 100000; // <-- 99999 is OK / 100000 is NG

$str = "";
for (
$i=0; $i<$length; $i++):
$str .= "1"; // <-- same result if it is a multibyte character.
endfor;

if (
mb_ereg('.*', $str)):
echo
'<br><span style="background-color:lightgreen">OK!</span><br>memory_limit = '.ini_get("memory_limit").'<br>$length = '.$length;
else:
echo
'<br><span style="background-color:orange">NG!</span><br>memory_limit = '.ini_get("memory_limit").'<br>$length = '.$length;
endif;

?>
up
0
Anonymous
2 years ago
If adding ".*" at the end of the pattern returns "false"
whereas only one "." returns "true",

Suspect the string is too long for the pattern matching.

In this case, using preg_match() returns "true" when putting ".*"
, but adding more "$" or "\z" returns "false" as expected.
up
0
Anonymous
3 years ago
mb_ereg() with a named-subpattern
never catches non-named-subpattern.
(Oniguruma's restriction)

<?php

$str
= 'abcdefg';
$patternA = '\A(abcd)(.*)\z'; // both caught [1]abcd [2]efg
$patternB = '\A(abcd)(?<rest>.*)\z'; // non-named 'abcd' never caught

mb_ereg($patternA, $str, $match);
echo
'<pre>'.print_r($match, true).'</pre>';

mb_ereg($patternB, $str, $match);
echo
'<pre>'.print_r($match, true).'</pre>';
?>

Array
(
[0] => abcdefg
[1] => abcd
[2] => efg
)

Array
(
[0] => abcdefg
[1] => efg
[rest] => efg
)
up
0
Anonymous
4 years ago
<?php

# What mb_ereg() returns & changes $_3rd_argument into
# (Just run this script)

function dump2str($var) {
ob_start();
var_dump($var);
$output = ob_get_contents();
ob_end_clean();
return
$output;
}

# (PHP7)empty pattern returns bool(false) with Warning
# (PHP8)empty pattern throws ValueError
$emp_ptn = '';
try{
$emp_ptn.= dump2str(mb_ereg('', 'abcde'));
}catch(
Exception | Error $e){
$emp_ptn.= get_class($e).'<br>';
$emp_ptn.= $e->getMessage();
$emp_ptn.= '<pre>'.$e->getTraceAsString().'</pre>';
}

echo
'PHP '.phpversion().'<br><br>'.

'# match<br>'.
dump2str(mb_ereg("bcd", "abcde")).
' : mb_ereg("bcd", "abcde")<br><br>'.

'# match with 3rd argument<br>'.
dump2str(mb_ereg("bcd", "abcde", $_3rd)).
' : mb_ereg("bcd", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.

'# match (0 byte)<br>'.
dump2str(mb_ereg("^", "abcde")).
' : mb_ereg("^", "abcde")<br><br>'.

'# match (0 byte) with 3rd argument<br>'.
dump2str(mb_ereg("^", "abcde", $_3rd)).
' : mb_ereg("^", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.

'# unmatch<br>'.
dump2str(mb_ereg("f", "abcde")).
' : mb_ereg("f", "abcde")<br><br>'.

'# unmatch with 3rd argument<br>'.
dump2str(mb_ereg("f", "abcde", $_3rd)).
' : mb_ereg("f", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.

'# empty pattern<br>'.
$emp_ptn.
' : mb_ereg("", "abcde")<br><br>'.

'# empty pattern with 3rd argument<br>'.
$emp_ptn.
' : mb_ereg("", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>';

?>
up
0
lastuser at example dot com
6 years ago
I hope this information is shown somewhere on php.net.

According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
4.7.1 between PHP 5.3 - 5.4.45,
5.9.2 between PHP 5.5 - 7.1.16,
6.3.0 since PHP 7.2 - .
up
0
mb_ereg() seems unable to Use &#34;named sub
9 years ago
mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.

<?php

$text
= 'multi_byte_string';
$pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5

if(mb_ereg($pattern, $text, $matches)){
echo
'<pre>'.print_r($matches, true).'</pre>';
}else{
echo
'no match';
}

?>

This code ignores "?<name>" in $pattern and displays below.

Array
(
[0] => multi_byte_string
[1] => string
)

$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){

instead of lines 2 & 3
displays below (in UTF-8 encoding).

Array
(
[0] => multi_byte_string
[name] => string
[1] => string
)
up
-1
Anonymous
5 years ago
<?php

// in PHP_VERSION 7.1

// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int); // int(1)

$int = mb_ereg('ab', '_ab_'); // [2 bytes match]
var_dump($int); // int(1)

$int = mb_ereg('^', '_ab_'); // [0 bytes match]
var_dump($int); // int(1)

$int = mb_ereg('ab', '__'); // [not match]
var_dump($int); // bool(false)

$int = mb_ereg('', '_ab_'); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)

$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)

// Without 3rd argument, mb_ereg() returns either int(1) or bool(false).

// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int); // int(5)
var_dump($regs); // array(1) { [0]=> string(5) "abcde" }

$int = mb_ereg('ab', '_ab_', $regs); // [2 bytes match]
var_dump($int); // int(2)
var_dump($regs); // array(1) { [0]=> string(2) "ab" }

$int = mb_ereg('^', '_ab_', $regs); // [0 bytes match]
var_dump($int); // int(1)
var_dump($regs); // array(1) { [0]=> bool(false) }

$int = mb_ereg('ab', '__', $regs); // [not match]
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }

$int = mb_ereg('', '_ab_', $regs); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }

$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }

// With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
// and 3rd argument is a bit complicated.

?>
up
-2
Riikka K
10 years ago
While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
up
-2
Jon
15 years ago
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).

<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo
"has";
}
else
{
echo
"doesn't have";
}
echo
" Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>
To Top