Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo "Text has some arabic/hebrew characters.";
}
else
{
echo "Text doesnt have arabic/hebrew characters.";
}
?>
mb_ereg
(PHP 4 >= 4.2.0, PHP 5)
mb_ereg — Regular expression match with multibyte support
Descrizione
int mb_ereg
( string
$pattern
, string $string
[, array $regs
] )Executes the regular expression match with multibyte support.
Elenco dei parametri
Valori restituiti
Executes the regular expression
match with multibyte support, and returns 1 if matches are found.
If the optional regs parameter was specified, the function
returns the byte length of matched part, and the array
regs will contain the substring of matched
string. The function returns 1 if it matches with the empty
string. If no matches are found or an error happens, FALSE will be
returned.
Note
Nota:
La codifica interna o la codifica dei caratteri specificata da mb_regex_encoding() sarà utilizzata come codifica ei caratteri per questa funzione.
Vedere anche:
- mb_regex_encoding() - Set/Get character encoding for multibyte regex
- mb_eregi() - Regular expression match ignoring case with multibyte support
pressler at hotmail dot de ¶
6 months ago
arash at hemmat dot biz ¶
3 years ago
I could easily remove any non Persian (Farsi) characters using this function, the range for Arabic and Persian are shared so this code could be used for Arabic too.
<?php mb_ereg_replace("[^-ۿ]","-",$string); ?>
This is the reference for finding the character range of Unicode languages:
http://unicode.org/charts/
Jon ¶
4 years ago
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>
