# preg_match

(PHP 4, PHP 5, PHP 7)

preg_matchEffectue une recherche de correspondance avec une expression rationnelle standard

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] ) Analyse subject pour trouver l'expression qui correspond à pattern. ### Liste de paramètres pattern Le masque à chercher, sous la forme d'une chaîne de caractères. subject La chaîne d'entrée. matches Si matches est fourni, il sera rempli par les résultats de la recherche.$matches[0] contiendra le texte qui satisfait le masque complet, $matches[1] contiendra le texte qui satisfait la première parenthèse capturante, etc. flags Le paramètre flags peut prendre l'une des valeurs suivantes : PREG_OFFSET_CAPTURE Si cette option est activée, toutes les sous-chaînes qui satisfont le masque seront aussi identifiées par leur offset. Notez que cela modifie la valeur de matches qui devient un tableau dont chaque élément est un tableau contenant la chaîne correspondant au masque à l'offset 0 ainsi que l'offset de la chaîne dans subject à l'offset 1. offset Normalement, la recherche commence au début de la chaîne subject. Le paramètre optionnel offset peut être utilisé pour spécifier une position pour le début de la recherche (en octets). Note: Utiliser le paramètre offset ne revient pas à passer substr($subject, $offset) à preg_match_all() à la place de la chaîne subject, car pattern peut contenir des assertions comme ^,$ ou (?<=x). Comparez :

 <?php$subject = "abcdef";$pattern = '/^def/';preg_match($pattern,$subject, $matches, PREG_OFFSET_CAPTURE, 3);print_r($matches);?> 

L'exemple ci-dessus va afficher :

Array
(
)


avec cet exemple :

 <?php$subject = "abcdef";$pattern = '/^def/';preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);print_r($matches);?> 

produira :

Array
(
[0] => Array
(
[0] => def
[1] => 0
)

)


### Valeurs de retour

preg_match() retourne 1 si le pattern fourni correspond, 0 s'il ne correspond pas, ou FALSE si une erreur survient.

Avertissement

Cette fonction peut retourner FALSE, mais elle peut aussi retourner une valeur équivalent à FALSE. Veuillez lire la section sur les booléens pour plus d'informations. Utilisez l'opérateur === pour tester la valeur de retour exacte de cette fonction.

### Historique

Version Description
5.3.6 Retourne FALSE si offset est plus grand que la taille de subject.
5.2.2 Les sous-masques nommés acceptent maintenant la syntaxe (?<name>) et (?'name') mais aussi (?P<name>). Les anciennes versions n'acceptaient que la syntaxe (?P<name>).

### Exemples

Exemple #1 Trouve la chaîne "php"

 <?php// Le "i" après le délimiteur du pattern indique que la recherche ne sera pas sensible à la casseif (preg_match("/php/i", "PHP est le meilleur langage de script du web.")) {    echo "Un résultat a été trouvé.";} else {    echo "Aucun résultat n'a été trouvé.";}?> 

Exemple #2 Trouve le mot "web"

 <?php/* \b, dans le masque, indique une limite de mot, de façon à ce que le mot "web" uniquement soit repéré, et pas seulement des parties de mots comme  dans "webbing" ou "cobweb" */if (preg_match("/\bweb\b/i", "PHP est le meilleur langage de script du web.")) {    echo "Le mot a été trouvé.";} else {    echo "Le mot n'a pas été trouvé.";}if (preg_match("/\bweb\b/i", "PHP est le meilleur langage de script du web.")) {    echo "Le mot a été trouvé.";} else {    echo "Le mot n'a pas été trouvé.";}?> 

Exemple #3 Lire un nom de domaine dans une URL

 <?php// repérer le nom de l'hôte dans l'URLpreg_match('@^(?:http://)?([^/]+)@i',    "http://www.php.net/index.html", $matches);$host = $matches[1];// repérer les deux derniers segments du nom de l'hôtepreg_match('/[^.]+\.[^.]+$/', $host,$matches);echo "Le nom de domaine est : {$matches[0]}\n";?>  L'exemple ci-dessus va afficher : Le nom de domaine est : php.net  Exemple #4 Utilisation des sous-masques nommés  <?php$str = 'foobar: 2008';preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str,$matches);/* Ceci fonctionne également en PHP 5.2.2 (PCRE 7.0) et suivants, * cependant, la syntaxe ci-dessus est recommandée pour des raisons * de compatibilités ascendantes */// preg_match('/(?<name>\w+): (?<digit>\d+)/', $str,$matches);print_r($matches);?>  L'exemple ci-dessus va afficher : Array ( [0] => foobar: 2008 [name] => foobar [1] => foobar [digit] => 2008 [2] => 2008 )  ### Notes Astuce N'utilisez pas preg_match() si vous voulez uniquement savoir si une chaîne est contenue dans une autre. Utilisez dans ce cas les fonctions strpos() ou strstr(), qui sont beaucoup plus rapides. ### Voir aussi add a note ### User Contributed Notes 66 notes 478 force at md-t dot org 5 years ago  Simple regexRegex quick reference[abc] A single character: a, b or c[^abc] Any single character but a, b, or c[a-z] Any single character in the range a-z[a-zA-Z] Any single character in the range a-z or A-Z^ Start of line$     End of line\A     Start of string\z     End of string.     Any single character\s     Any whitespace character\S     Any non-whitespace character\d     Any digit\D     Any non-digit\w     Any word character (letter, number, underscore)\W     Any non-word character\b     Any word boundary character(...)     Capture everything enclosed(a|b)     a or ba?     Zero or one of aa*     Zero or more of aa+     One or more of aa{3}     Exactly 3 of aa{3,}     3 or more of aa{3,6}     Between 3 and 6 of aoptions: i case insensitive m make dot match newlines x ignore whitespace in regex o perform #{...} substitutions only once 
57
MrBull
5 years ago
 Sometimes its useful to negate a string. The first method which comes to mind to do this is: [^(string)] but this of course won't work. There is a solution, but it is not very well known. This is the simple piece of code on how a negation of a string is done:(?:(?!string).)?: makes a subpattern (see http://www.php.net/manual/en/regexp.reference.subpatterns.php) and ?! is a negative look ahead. You put the negative look ahead in front of the dot because you want the regex engine to first check if there is an occurrence of the string you are negating. Only if it is not there, you want to match an arbitrary character.Hope this helps some ppl. 
32
5 years ago
 This sample is for checking persian character:<?php   preg_match("/[\x{0600}-\x{06FF}\x]{1,32}/u", 'محمد');?> 
28
jonathan dot lydall at gmail dot removethispart dot com
8 years ago
 Because making a truly correct email validation function is harder than one may think, consider using this one which comes with PHP through the filter_var function (http://www.php.net/manual/en/function.filter-var.php):<?php$email = "someone@domain .local";if(!filter_var($email, FILTER_VALIDATE_EMAIL)) {    echo "E-mail is not valid";} else {    echo "E-mail is valid";}?> 
17
ian_channing at hotmail dot com
7 years ago
 This is a function that uses regular expressions to match against the various VAT formats required across the EU. <?php /** * @param integer $country Country name * @param integer$vat_number VAT number to test e.g. GB123 4567 89 * @return integer -1 if country not included OR 1 if the VAT Num matches for the country OR 0 if no match */ function checkVatNumber( $country,$vat_number ) {     switch($country) { case 'Austria':$regex = '/^(AT){0,1}U[0-9]{8}$/i'; break; case 'Belgium':$regex = '/^(BE){0,1}[0]{0,1}[0-9]{9}$/i'; break; case 'Bulgaria':$regex = '/^(BG){0,1}[0-9]{9,10}$/i'; break; case 'Cyprus':$regex = '/^(CY){0,1}[0-9]{8}[A-Z]$/i'; break; case 'Czech Republic':$regex = '/^(CZ){0,1}[0-9]{8,10}$/i'; break; case 'Denmark':$regex = '/^(DK){0,1}([0-9]{2}[\ ]{0,1}){3}[0-9]{2}$/i'; break; case 'Estonia': case 'Germany': case 'Greece': case 'Portugal':$regex = '/^(EE|EL|DE|PT){0,1}[0-9]{9}$/i'; break; case 'France':$regex = '/^(FR){0,1}[0-9A-Z]{2}[\ ]{0,1}[0-9]{9}$/i'; break; case 'Finland': case 'Hungary': case 'Luxembourg': case 'Malta': case 'Slovenia':$regex = '/^(FI|HU|LU|MT|SI){0,1}[0-9]{8}$/i'; break; case 'Ireland':$regex = '/^(IE){0,1}[0-9][0-9A-Z\+\*][0-9]{5}[A-Z]$/i'; break; case 'Italy': case 'Latvia':$regex = '/^(IT|LV){0,1}[0-9]{11}$/i'; break; case 'Lithuania':$regex = '/^(LT){0,1}([0-9]{9}|[0-9]{12})$/i'; break; case 'Netherlands':$regex = '/^(NL){0,1}[0-9]{9}B[0-9]{2}$/i'; break; case 'Poland': case 'Slovakia':$regex = '/^(PL|SK){0,1}[0-9]{10}$/i'; break; case 'Romania':$regex = '/^(RO){0,1}[0-9]{2,10}$/i'; break; case 'Sweden':$regex = '/^(SE){0,1}[0-9]{12}$/i'; break; case 'Spain':$regex = '/^(ES){0,1}([0-9A-Z][0-9]{7}[A-Z])|([A-Z][0-9]{7}[0-9A-Z])$/i'; break; case 'United Kingdom':$regex = '/^(GB){0,1}([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2})|([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2}[\ ]{0,1}[0-9]{3})|((GD|HA)[0-9]{3})$/i'; break; default: return -1; break; } return preg_match($regex, $vat_number); } ?> 
akniep at rayo dot info
7 years ago
 This sample regexp may be useful if you are working with DB field types. (?P<type>\w+)($|$$(?P<length>(\d+|(.*)))$$)For example, if you are have a such type as "varchar(255)" or "text", the next fragment<?php   $type = 'varchar(255)'; // type of field preg_match('/(?P<type>\w+)($|$$(?P<length>(\d+|(.*)))$$)/', $type,$field);   print_r($field);?>will output something like this:Array ( [0] => varchar(255) [type] => varchar [1] => varchar [2] => (255) [length] => 255 [3] => 255 [4] => 255 ) It wraps the possibly crashing preg_match call by decreasing the PCRE recursion limit in order to result in a Reg-Exp error instead of a PHP-crash.<?php[...]// decrease the PCRE recursion limit for the (possibly dangerous) preg_match call$former_recursion_limit = ini_set( "pcre.recursion_limit", 10000 );// the wrapped preg_match call$result = preg_match($pattern, $text );// reset the PCRE recursion limit to its original valueini_set( "pcre.recursion_limit",$former_recursion_limit );// if the reg-exp fails due to the decreased recursion limit we may not make any statement, but PHP-execution continuesif ( PREG_RECURSION_LIMIT_ERROR === preg_last_error() ){    // react on the failed regular expression here    $result = [...]; // do logging or email-sending here [...]} //if?>Possible bug (2):=============On one of our Windows-Servers the above example does not crash PHP, but (directly) hits the recursion-limit. Here, the problem is that preg_match does not return boolean(false) as expected by the description / manual of above.In short, preg_match seems to return an int(0) instead of the expected boolean(false) if the regular expression could not be executed due to the PCRE recursion-limit. It wraps the possibly crashing preg_match call by decreasing the PCRE recursion limit in order to result in a Reg-Exp error instead of a PHP-crash.<?php[...]// decrease the PCRE recursion limit for the (possibly dangerous) preg_match call$former_recursion_limit = ini_set( "pcre.recursion_limit", 10000 );// the wrapped preg_match call$result = preg_match($pattern, $text );// reset the PCRE recursion limit to its original valueini_set( "pcre.recursion_limit",$former_recursion_limit );// if the reg-exp fails due to the decreased recursion limit we may not make any statement, but PHP-execution continuesif ( PREG_RECURSION_LIMIT_ERROR === preg_last_error() ){    // react on the failed regular expression here    $result = [...]; // do logging or email-sending here [...]} //if?>Possible bug (2):=============On one of our Windows-Servers the above example does not crash PHP, but (directly) hits the recursion-limit. Here, the problem is that preg_match does not return boolean(false) as expected by the description / manual of above.In short, preg_match seems to return an int(0) instead of the expected boolean(false) if the regular expression could not be executed due to the PCRE recursion-limit. So, if preg_match results in int(0) you seem to have to check preg_last_error() if maybe an error occurred. I noticed that in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)for instance : '#(*UTF8)[[:alnum:]]#' will return TRUE for 'é' where '#[[:alnum:]]#' will return FALSEfound this very very useful tip after hours of research over the web directly in pcre website right here : http://www.pcre.org/pcre.txtthere are many further informations about UTF-8 support in the libhop that will help! There does not seem to be any mention of the PHP version of switches that can be used with regular expressions. preg_match_all('/regular expr/sim',$text). The s i m being the location for and available switches (I know about) The i is to ignore letter cases (this is commonly known - I think) The s tells the code NOT TO stop searching when it encounters \n (line break) - this is important with multi-line entries for example text from an editor that needs search. The m tells the code it is a multi-line entry, but importantly allows the use of ^ and $to work when showing start and end. I am hoping this will save someone from the 4 hours of torture that I endured, trying to workout this issue. 
Gilles A
2 years ago
 Using named subpattern :Since PCRE 7.0 ( PHP  >= 5.2.2) , named groups can be defined using (?<name>) or (?'name') instead of (?P<name>)<?php$str = 'foobar: 2008';preg_match('/(?P<name>\w+): (?P<digit>\d+)/',$str, $matches);print_r($matches);//Orpreg_match('/(?\'name\'\w+): (?\'digit\'\d+)/', $str,$matches);print_r($matches);//Orpreg_match('/(?<name>\w+): (?<digit>\d+)/',$str, $matches);print_r($matches);?>//Result Array(    [0] =>foobar: 2008    [name] => foobar    [1] => foobar    [digit] => 2008    [2] => 2008)Array(    [0] => foobar: 2008    [name] => foobar    [1] => foobar    [digit] => 2008    [2] => 2008)Array(    [0] => foobar: 2008    [name] => foobar    [1] => foobar    [digit] => 2008    [2] => 2008) 
10
daevid at daevid dot com
7 years ago
 I just learned about named groups from a Python friend today and was curious if PHP supported them, guess what -- it does!!!http://www.regular-expressions.info/named.html<?php   preg_match("/(?P<foo>abc)(.*)(?P<bar>xyz)/",                       'abcdefghijklmnopqrstuvwxyz',                       $matches); print_r($matches);?>will produce: Array(    [0] => abcdefghijklmnopqrstuvwxyz    [foo] => abc    [1] => abc    [2] => defghijklmnopqrstuvw    [bar] => xyz    [3] => xyz)Note that you actually get the named group as well as the numerical keyvalue too, so if you do use them, and you're counting array elements, beaware that your array might be bigger than you initially expect it to be. 
18
Yousef Ismaeil Cliprz
3 years ago
 Some times a Hacker use a php file or shell as a image to hack your website. so if you try to use move_uploaded_file() function as in example to allow for users to upload files, you must check if this file contains a bad codes or not so we use this function. preg matchin this function we useunlink() - http://php.net/unlinkafter you upload file check a file with below function. <?php/** * A simple function to check file from bad codes. * * @param (string) $file - file path. * @author Yousef Ismaeil - Cliprz[at]gmail[dot]com. */function is_clean_file ($file){    if (file_exists($file)) {$contents = file_get_contents($file); } else { exit($file." Not exists.");    }    if (preg_match('/(base64_|eval|system|shell_|exec|php_)/i',$contents)) { return true; } else if (preg_match("#&\#x([0-9a-f]+);#i",$contents))    {        return true;    }    elseif (preg_match('#&\#([0-9]+);#i', $contents)) { return true; } elseif (preg_match("#([a-z]*)=([\\'\"]*)script:#iU",$contents))    {        return true;    }    elseif (preg_match("#([a-z]*)=([\\'\"]*)javascript:#iU", $contents)) { return true; } elseif (preg_match("#([a-z]*)=([\'\"]*)vbscript:#iU",$contents))    {        return true;    }    elseif (preg_match("#(<[^>]+)style=([\\'\"]*).*expression\([^>]*>#iU", $contents)) { return true; } elseif (preg_match("#(<[^>]+)style=([\\'\"]*).*behaviour\([^>]*>#iU",$contents))    {        return true;    }    elseif (preg_match("#</*(applet|link|style|script|iframe|frame|frameset|html|body|title|div|p|form)[^>]*>#i", $contents)) { return true; } else { return false; }}?>Use<?php// If image contains a bad codes$image   = "simpleimage.png";if (is_clean_file($image)){ echo "Bad codes this is not image"; unlink($image);}else{    echo "This is a real image.";}?> 
ulli dot luftpumpe at murkymind dot de
4 years ago
 Matching a backslash character can be confusing, because double escaping is needed in the pattern: first for PHP, second for the regex engine<?php//match newline control character:preg_match('/\n/','\n');   //pattern matches and is stored as control character 0x0A in the pattern stringpreg_match('/\\\n/','\n'); //very same match, but is stored escaped as 0x5C,0x6E in the pattern string//trying to match "\'" (2 characters) in a text file, '\\\'' as PHP string:$subject = file_get_contents('myfile.txt');preg_match('/\\\'/',$subject);    //DOESN'T MATCH!!! stored as 0x5C,0x27 (escaped apostrophe), this only matches apostrophepreg_match('/\\\\\'/',$subject); //matches, stored as 0x5C,0x5C,0x27 (escaped backslash and unescaped apostrophe)preg_match('/\\\\\\\/',$subject); //also matches, stored as 0x5C,0x5C,0x5C,0x27 (escaped backslash and escaped apostrophe)//matching "\n" (2 characters):preg_match('/\\\\n/','\\n');preg_match('/\\\n/','\\n'); //same match - 3 backslashes are interpreted as 2 in PHP, if the following character is not escapeable?> 
Jonny 5
4 years ago
 Workaround for getting the offset in UTF-8 (in some cases mb_strpos might be an option as well) <?php if(preg_match($pattern,$haystack,$out,PREG_OFFSET_CAPTURE)) {$offset = strlen(utf8_decode(substr($haystack,0,$out[0][1]))); } ?> 
cmallabon at homesfactory dot com
5 years ago
 Just an interesting note. Was just updating code to replace ereg() with strpos() and preg_match and the thought occured that preg_match() could be optimized to quit early when only searching if a string begins with something, for example<?phpif(preg_match("/^http/", $url)){ //do something}?> vs <?php if(strpos($url, "http") === 0){//do something}?>As I guessed, strpos() is always faster (about 2x) for short strings like a URL but for very long strings of several paragraphs (e.g. a block of XML) when the string doesn't start with the needle preg_match as twice as fast as strpos() as it doesn't scan the entire string.So, if you are searching long strings and expect it to normally be true (e.g. validating XML), strpos() is a much faster BUT if you expect if to often fail, preg_match is the better choice. 
teracci2002
6 years ago
 When you use preg_match() for security purpose or huge data processing,mayby you should make consideration for backtrack_limit and recursion_limit.http://www.php.net/manual/en/pcre.configuration.phpThese limits may bring wrong matching result.You can verify whether you hit these limits by checking preg_last_error().http://www.php.net/manual/en/function.preg-last-error.php 
ruakuu at NOSPAM dot com
6 years ago
 Was working on a site that needed japanese and alphabetic letters and needed to validate input using preg_match, I tried using \p{script} but didn't work:<?php$pattern ='/^([-a-zA-Z0-9_\p{Katakana}\p{Hiragana}\p{Han}]*)$/u'; // Didn't work?>So I tried with ranges and it worked:<?php$pattern ='/^[-a-zA-Z0-9_\x{30A0}-\x{30FF}' .'\x{3040}-\x{309F}\x{4E00}-\x{9FBF}\s]*$/u';$match_string = '印刷最安 ニキビ跡除去 ゲームボーイ';if (preg_match($pattern, $match_string)) { echo "Found - pattern$pattern";} else {    echo "Not found - pattern $pattern";}?>U+4E00–U+9FBF KanjiU+3040–U+309F HiraganaU+30A0–U+30FF KatakanaHope its useful, it took me several hours to figure it out.  skds1433 at hotmail dot com 7 years ago  here is a small tool for someone learning to use regular expressions. it's very basic, and allows you to try different patterns and combinations. I made it to help me, because I like to try different things, to get a good understanding of how things work.<?php$search = isset($_POST['search'])?$_POST['search']:"//";$match = isset($_POST['match'])?$_POST['match']:"<>";echo '<form method="post">';echo 's: <input style="width:400px;" name="search" type="text" value="'.$search.'" /><br />';echo 'm:<input style="width:400px;" name="match" type="text" value="'.$match.'" /><input type="submit" value="go" /></form><br />';if (preg_match($search, $match)){echo "matches";}else{echo "no match";}?>  wjaspers4 [at] gmail [dot] com 7 years ago  I recently encountered a problem trying to capture multiple instances of named subpatterns from filenames.Therefore, I came up with this function.The function allows you to pass through flags (in this version it applies to all expressions tested), and generates an array of search results.Enjoy!<?php/** * Allows multiple expressions to be tested on one string. * This will return a boolean, however you may want to alter this. * * @author William Jaspers, IV <wjaspers4@gmail.com> * @created 2009-02-27 17:00:00 +6:00:00 GMT * @access public * * @param array$patterns An array of expressions to be tested. * @param String $subject The data to test. * @param array$findings Optional argument to store our results. * @param mixed $flags Pass-thru argument to allow normal flags to apply to all tested expressions. * @param array$errors A storage bin for errors * * @returns bool Whether or not errors occurred. */function preg_match_multiple(   array $patterns=array(),$subject=null,  &$findings=array(),$flags=false,  &$errors=array()) { foreach($patterns as $name =>$pattern )  {    if( 1 <= preg_match_all( $pattern,$subject, $found,$flags ) )    {      $findings[$name] = $found; } else { if( PREG_NO_ERROR !== ($code = preg_last_error() ))      {        $errors[$name] = $code; } else$findings[$name] = array(); } } return (0===sizeof($errors));}?> 
matt
7 years ago
 To support large Unicode ranges (ie: [\x{E000}-\x{FFFD}] or \x{10FFFFF}) you must use the modifier '/u' at the end of your expression. 
splattermania at freenet dot de
7 years ago
 As I wasted lots of time finding a REAL regex for URLs and resulted in building it on my own, I now have found one, that seems to work for all kinds of urls: <?php     $regex = "((https?|ftp)\:\/\/)?"; // SCHEME$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass     $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP$regex .= "(\:[0-9]{2,5})?"; // Port     $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path     $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor ?> Then, the correct way to check against the regex ist as follows: <?php if(preg_match("/^$regex$/",$url))        {                return true;        } ?> 
Nimja
4 years ago
 When using a 'bad words reject string' filter, preg_match is MUCH faster than strpos / stripos. Because in the other cases, you would need to do a foreach for each word. With efficient programming, the foreach is ONLY faster when the first word in the ban-list is found.(for 12 words, 100,000 iterations, no word found)stripos - Taken 1.4876 seconds.strpos - Taken 1.4207 seconds.preg_match - Taken 0.189 seconds.Interesting fact:With long words ('averylongwordtospitepreg'), the difference is only much less. Only about a 2/3rd of the time instead of 1/6th<?php$words = array('word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'word10', 'word11', 'word12' );$teststring = 'ThIs Is A tEsTsTrInG fOr TeStInG.';$count = 100000;$find = 0;$start = microtime(TRUE);for ($i = 0; $i <$count; $i++) { foreach ($words as $word) { if (stripos($teststring, $word) !== FALSE) {$find++;            break;        }    }}echo 'stripos - Taken ' . round(microtime(TRUE) - $start, 4) . ' seconds.' . PHP_EOL;$start = microtime(TRUE);for ($i = 0;$i < $count;$i++) {    foreach ($words as$word) {        if (strpos($teststring,$word) !== FALSE) {            $find++; break; } }}echo 'strpos - Taken ' . round(microtime(TRUE) -$start, 4) . ' seconds.' . PHP_EOL;$start = microtime(TRUE);$pattern = '/';$div = '';foreach ($words as $word) {$pattern .= $div . preg_quote($word);    $div = '|';}$pattern .= '/i';//Pattern could easily be done somewhere else if words are static.for ($i = 0;$i < $count;$i++) {    if (preg_match($pattern,$teststring)) {        $find++; }}$end = microtime(TRUE);echo 'preg_match - Taken ' . round($end -$start, 4) . ' seconds.' . PHP_EOL;?> 
Frank
5 years ago
 If someone is from a country that accepts decimal numbers in format 9.00 and 9,00 (point or comma), number validation would be like that:<?php$number_check = "9,99";if (preg_match( '/^[\-+]?[0-9]*\.*\,?[0-9]+$/', $number_check)) { return TRUE; }?>However, if the number will be written in the database, most probably this comma needs to be replaced with a dot. This can be done with use of str_replace, i.e :<?php$number_database = str_replace("," , "." , $number_check);?>  Anonymous 4 years ago  Here is a function that decreases the numbers inside a string (useful to convert DOM object into simplexml object)e.g.: decremente_chaine("somenode->anode[2]->achildnode[3]") will return "somenode->anode[1]->achildnode[2]"the numbering of the nodes in simplexml starts from zero, but from 1 in DOM xpath objects<?phpfunction decremente_chaine($chaine)    {        //récupérer toutes les occurrences de nombres et leurs indices        preg_match_all("/[0-9]+/",$chaine,$out,PREG_OFFSET_CAPTURE);            //parcourir les occurrences             for($i=0;$i<sizeof($out[0]);$i++)            {                $longueurnombre = strlen((string)$out[0][$i][0]);$taillechaine = strlen($chaine); // découper la chaine en 3 morceaux$debut = substr($chaine,0,$out[0][$i][1]);$milieu = ($out[0][$i][0])-1;                $fin = substr($chaine,$out[0][$i][1]+$longueurnombre,$taillechaine);                 // si c'est 10,100,1000 etc. on décale tout de 1 car le résultat comporte un chiffre de moins                 if(preg_match('#[1][0]+$#',$out[0][$i][0])) { for($j = $i+1;$j<sizeof($out[0]);$j++)                    {                        $out[0][$j][1] = $out[0][$j][1] -1;                    }                 }                $chaine =$debut.$milieu.$fin;            }        return $chaine; }?>  Stefan 6 years ago  I spent a while replacing all my ereg() calls to preg_match(), since ereg() is now deprecated and will not be supported as of v 6.0. Just a warning regarding the conversion, the two functions behave very similarly, but not exactly alike. Obviously, you will need to delimit your pattern with '/' or '|' characters. The difference that stumped me was that preg_replace overwrites the$matches array regardless if a match was found. If no match was found, $matches is simply empty. ereg(), however, would leave$matches alone if a match was not found. In my code, I had repeated calls to ereg, and was populating $matches with each match. I was only interested in the last match. However, with preg_match, if the very last call to the function did not result in a match, the$matches array would be overwritten with a blank value. Here is an example code snippet to illustrate: <?php $test = array('yes','no','yes','no','yes','no'); foreach ($test as $key=>$value) {   ereg("yes",$value,$matches1);   preg_match("|yes|",$value,$matches2); }   print "ereg result: $matches1[0]<br>"; print "preg_match result:$matches2[0]<br>"; ?> The output is: ereg result: yes preg_match result: ($matches2[0] in this case is empty) I believe the preg_match behavior is cleaner. I just thought I would report this to hopefully save others some time.  Ashus 8 years ago  If you need to match specific wildcards in IP address, you can use this regexp:<?php$ip = '10.1.66.22';$cmp = '10.1.??.*';$cnt = preg_match('/^'     .str_replace(     array('\*','\?'),     array('(.*?)','[0-9]'),     preg_quote($cmp)).'$/',     $ip);echo$cnt;?>where '?' is exactly one digit and '*' is any number of any characters. $cmp mask can be provided wild by user,$cnt equals (int) 1 on match or 0. 
ian_channing at hotmail dot com
5 years ago
 When trying to check a file path that could be windows or unix it took me quite a few tries to get the escape characters right.The Unix directory separator must be escaped once and the windows directory separator must be escaped twice.This will match path/to/file and path\to\file.exepreg_match('/^[a-z0-9_.\/\\\]*$/i',$file_string); 
plasma
6 years ago
 To extract scheme, host, path, ect. simply use <?php   $url = 'http://name:pass@';$url .= 'example.com:10000';   $url .= '/path/to/file.php?a=1&amp;b=2#anchor';$url_data = parse_url ( $url ); print_r ($url_data ); ?> ___ prints out something like: Array (     [scheme] => http     [host] => wild.subdomain.orgy.domain.co.uk     [port] => 10000     [user] => name     [pass] => pass     [path] => /path/to/file.php     [query] => a=1&b=2     [fragment] => anchor ) In my tests parse_url is up to 15x faster than preg_match(_all)! 
ayman2243 at gmail dot com
5 years ago
 highlight Search Words <?php function highlight($word,$subject) {         $split_subject = explode(" ",$subject);     $split_word = explode(" ",$word);     foreach ($split_subject as$k => $v){ foreach ($split_word as $k2 =>$v2){                if($v2 ==$v){                                       $split_subject[$k] = "<span class='highlight'>".$v."</span>"; } } } return implode(' ',$split_subject); } ?> 
sun at drupal dot org
5 years ago
 Basic test for invalid UTF-8 that can hi-jack IE:<?php$valid = (preg_match('/^./us',$text) == 1);?>See http://api.drupal.org/api/drupal/includes--bootstrap.inc/function/drupal_validate_utf8/7 for details.---Test for valid UTF-8 and XML/XHTML character range compatibility:<?php$invalid = preg_match('@[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]@u',$text)?>Ref: http://www.w3.org/TR/2000/REC-xml-20001006#charsets 
andre at koethur dot de
3 years ago
 Be aware of bug https://bugs.php.net/bug.php?id=50887 when using sub patterns: Un-matched optional sub patterns at the end won't show up in $matches.Here is a workaround: Assign a name to all subpatterns you are interested in, and merge$match afterwards with an constant array containing some reasonable default values:<?phpif (preg_match('/^(?P<lang>[^;*][^;]*){1}(?:;q=(?P<qval>[0-9.]+))?$/u', 'de',$match)){  $match = array_merge(array('lang' => '', 'qval' => ''),$match);  print_r($match);}?>This outputs:Array( [lang] => de [qval] => [0] => de [1] => de)Instead of:Array( [0] => de [lang] => de [1] => de)  Anonymous 7 years ago  If your regular expression does not match with long input text when you think it should, you might have hit the PCRE backtrack default limit of 100000. See http://php.net/pcre.backtrack-limit.  wjaspers4[at]gmail[dot]com 8 years ago  I found this rather useful for testing mutliple strings when developing a regex pattern.<?php /** * Runs preg_match on an array of strings and returns a result set. * @author wjaspers4[at]gmail[dot]com * @param String$expr The expression to match against * @param Array $batch The array of strings to test. * @return Array */function preg_match_batch($expr, $batch=array() ){// create a placeholder for our results$returnMe = array();// for every string in our batch ...    foreach( $batch as$str )    {// test it, and dump our findings into $found preg_match($expr, $str,$found);// append our findings to the placeholder        $returnMe[$str] = $found; } return$returnMe;}?> 
jphansen at uga dot edu
4 years ago
 Here's a regex to validate against the schema for common MySQL identifiers: <?php $string = "$table_name"; if (preg_match("/[^\\d\\sa-zA-Z$_]/",$string))   echo "Failed validation"; ?> 
workhorse at op dot pl
5 years ago
 Preg_match returns empty result trying to validate $subject with carriege returns (/n/r).To solve it one need to use /s modifier in$pattern string.<?php$pattern='/.*/s';$valid=preg_match($pattern,$subject, $match);?>  Anonymous 6 years ago  The regular expression for breaking-down a URI reference into its components: ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9Source: ietf.org/rfc/rfc2396.txt  marcosc at tekar dot net 7 years ago  When using accented characters and "ñ" (áéíóúñ), preg_match does not work. It is a charset problem, use utf8_decode/decode to fix.  Jeff Weiss 3 years ago  Example of validating an email address and breaking it into 3 parts ( local, domain name, domain suffix )A case insensitive email is valid if:1) local matches letters a..z or characters . - _ + 2) domain name matches letters a..z or characters - _3) domain suffix matches letters a..z and is between 2 and 4 characters in length <?phppreg_match('/(^[a-zA-Z_.+-]+)@([a-zA-Z_-]+).([a-zA-Z]{2,4}$)/i', "jeff@nowhere.com", $matches);var_export($matches);?>outputs:Array(    [0] => jeff@nowhere.com    [1] => jeff    [2] => nowhere    [3] => com) 
hessemanj2100 at gmail dot com
3 years ago
 The most accurate IPv4 function. It will not allow leading zeros and supports the full address range of 0.0.0.0 - 255.255.255.255<?phpfunction is_ipv4($string){ // The regular expression checks for any number between 0 and 255 beginning with a dot (repeated 3 times) // followed by another number between 0 and 255 at the end. The equivalent to an IPv4 address. return (bool) preg_match('/^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])'. '\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|[0-9])$/', $string);}?>  daniel dot chcouri at gmail dot com 7 years ago  Html tags delete using regular expression <?php function removeHtmlTagsWithExceptions($html, $exceptions = null){ if(is_array($exceptions) && !empty($exceptions)) { foreach($exceptions as $exception) {$openTagPattern  = '/<(' . $exception . ')(\s.*?)?>/msi';$closeTagPattern = '/<\/(' . $exception . ')>/msi';$html = preg_replace(                 array($openTagPattern,$closeTagPattern),                 array('||l|\1\2|r||', '||l|/\1|r||'),                 $html ); } }$html = preg_replace('/<.*?>/msi', '', $html); if(is_array($exceptions))     {         $html = str_replace('||l|', '<',$html);         $html = str_replace('|r||', '>',$html);     }     return $html; } // example: print removeHtmlTagsWithExceptions(<<<EOF <h1>Whatsup?!</h1> Enjoy <span style="text-color:blue;">that</span> script<br /> <br /> EOF , array('br')); ?>  Alex Zinchenko 7 years ago  If you need to check whether string is a serialized representation of variable(sic!) you can use this :<?php$string = "a:0:{}";if(preg_match("/(a|O|s|b)\x3a[0-9]*?((\x3a((\x7b?(.+)\x7d)|(\x22(.+)\x22\x3b)))|(\x3b))/", $string)) {echo "Serialized.";}else {echo "Not serialized.";}?>But don't forget, string in serialized representation could be VERY big, so match work can be slow, even with fast preg_* functions.  kevin 4 years ago  here is a little function to get an associative array instead of the numeric one.<?phpfunction preg_match_assoc($pattern, $subject,$assoc, $flags = 0,$offset = 0) {    $matches = array(); eval('preg_match($pattern, $subject,$matches, $flags,$offset);');    $n = 0; foreach($matches as $result) {$array[$assoc[$n]] = $result;$n++;    }    return $array;}?>example of use<?php$assocs = array(    'all',    'a-1',    'i-1',    'a-2',    'ia-1',    'ia-2');$test = preg_match_assoc('#([a-z]+)([0-9]+)([a-z]+)\-([a-z|0-9]+)\-([a-z|0-9]+)#', 'az45rt-df36qz-fg89ih',$assocs);// $test will contain ://// array// 'all' => string 'az45rt-df36qz-fg89ih' (length=20)// 'a-1' => string 'az' (length=2)// 'i-1' => string '45' (length=2)// 'a-2' => string 'rt' (length=2)// 'ia-1' => string 'df36qz' (length=6)//// Instead of ://// array// 0 => string 'az45rt-df36qz-fg89ih' (length=20)// 1 => string 'az' (length=2)// 2 => string '45' (length=2)// 2 => string 'rt' (length=2)// 4 => string 'df36qz' (length=6)// 5 => string 'fg89ih' (length=6)?>  mulllhausen 5 years ago  i do a fair bit of html scraping in conjunction with curl. i always need to know if i have reached the right page or if the curl request failed. the main problem i have encountered is html tags having unexpected spaces or other characters (especially the &nbsp; character sequence) between them. for example when requesting a page with a certain manner set of post or get variables the response might be<a href='blah'><span>data data data</span></a>but requesting the same page with different post/get variables might give the following result:<a href='blah'> &nbsp;<span>data data data</span></a>to match both of these tag sequences with the same pattern i use the [\S\s]*? wildcard which basically means 'match anything at all...but not if you can help it'so the pattern for the above sequence would be:<?php$page1 = "........<a href='blah'><span>data data data</span></a>.........";$page2 = "........<a href='blah'> &nbsp;<span>data data data</span></a>........";$w = "[\s\S]*?"; //ungreedy wildcard$pattern = "/\<a href='blah'\>$w\<span\>data data data\<\/span\>$w\<\/a\>/";if(preg_match($pattern, $page1,$matches)) echo "got to page 1. match: [".print_r($matches, true)."]\n";else echo "did not get to page 1\n";if(preg_match($pattern, $page2,$matches)) echo "got to page 2. match: [".print_r($matches, true)."]\n";else echo "did not get to page 2\n";?>  Dr@ke 6 years ago  Hello,There is a bug with somes new PCRE versions (like:7.9 2009-04-1),In patterns:\w+ !== [a-zA-Z0-9]+But it's ok, if i replace \w+ by [a-z0-9]+ or [a-zA-Z0-9]+  phil dot taylor at gmail dot com 8 years ago  If you need to check for .com.br and .com.au and .uk and all the other crazy domain endings i found the following expression works well if you want to validate an email address. Its quite generous in what it will allow <?php$email_address = "phil.taylor@a_domain.tv";     if (preg_match("/^[^@]*@[^@]*\.[^@]*$/",$email_address)) {         return "E-mail address";            }         ?> 
Dino Korah AT webroot DOT com
8 years ago
 preg_match and preg_replace_callback doesnt match up in the structure of the array that they fill-up for a match.preg_match, as the example shows, supports named patterns, whereas preg_replace_callback doesnt seem to support it at all. It seem to ignore any named pattern matched. 
itworkarounds at gmail dot com
5 years ago
 You can use the following code to detect non-latin (Cyrilic, Arabic, Greek...) characters: <?php preg_match("/^[a-zA-Z\p{Cyrillic}0-9\s\-]+$/u", "ABC abc 1234 АБВ абв"); ?>  solixmexico at outlook dot com 13 days ago  To validate directorys on Windows i used this:if( preg_match("#^([a-z]{1}\:{1})?[\\\/]?([\-\w]+[\\\/]?)*$#i",$_GET['path'],$matches) !== 1 ){    echo("Invalid value");}else{    echo("Valid value");}The parts are:#^ and $i Make the string matches at all the pattern, from start to end for ensure a complete match.([a-z]{1}\:{1})? The string may starts with one letter and a colon, but only 1 character for eachone, this is for the drive letter (C:)[\\\/]? The string may contain, but not require 1 slash or backslash after the drive letter, (\/)([\-\w]+[\\\/]?)* The string must have 1 or more of any character like hyphen, letter, number, underscore, and may contain a slash or back slash at the end, to have a directory like ("/" or "folderName" or "folderName/"), this may be repeated one or more times.  Julius 8 months ago  Regarding utf-8 and offset:Be aware that the 5th Parameter behaves in the same way as the 4th is handeled. The$offset parameter should therefore be given as byte length.<?phpvar_dump(preg_match('/#/u', 'a#',$matches,0,2));var_dump(preg_match('/#/u', "\xc3\xa4#",$matches,0,2));var_dump(preg_match('/#/u', "\xc3\xa4#",$matches,0,3));?>  -1 bstefanovic at outlook dot com 7 months ago  This tool really helped me achieve the best possible results with this function: http://wordpresstester.com/  Supriya Karmakar Kolkata 2 years ago  Always escape double quotes to avoid errors, even if you don't need to.bad practice:$foo = preg_match('/<h2 class="bengali">.*?<\/h2>/', $bigTextChunk,$myArray);good practice:$foo = preg_match("/<h2 class=\"bengali\">.*?<\/h2>/", bigTextChunk,$myArray);Bad practice can cause mysterious errors as it happened in my case. 
dkr at dotnull dot de
2 years ago
 I noted that PCRE_ANCHORED (the pattern modifier A) does work fine if using an offset. If you use the escape sequence \A or even the dash "^" in the regex, it does not work (even if in multiline mode)... <?php$text = 'foo bar';print (int) preg_match('/^bar/',$text,$a,null,4); // prints 0print (int) preg_match('/\Abar/',$text,$a,null,4); // prints 0print (int) preg_match('/bar/A',$text,$a,null,4); // prints 1?>Hope this helps someone out there! :-)Version: PHP 5.5.12  asdfasdasad34535 at iflow dot at 3 years ago  Attention! PREG_OFFSET_CAPTURE not UTF-8 aware when using u modifierand it's not a but, it's a feature:https://bugs.php.net/bug.php?id=37391Possible workaround: Use mb_strpos to get the correct offset, instead of the flag. UTF-8 support would be nice.  hessemanj2100 at gmail dot com 3 years ago  Just a note about my last post. The regex expression for the function I posted contains a question mark at the end. Technically this doesn't need to be there but it will work with or without it. Just remove it if you don't want it. Enjoy!  aer0s 4 years ago  Simple function to return a sub-string following the preg convention. Kind of expensive, and some might say lazy but it has saved me time.# preg_substr($pattern,$subject,[$offset]) function# @author   aer0s#  return a specific sub-string in a string using #   a regular expression # @param   $pattern regular expression pattern to match# @param$subject   string to search# @param   [$offset] zero based match occurrence to return# # [$offset] is 0 by default which returns the first occurrence,# if [$offset] is -1 it will return the last occurrence function preg_substr($pattern,$subject,$offset=0){    preg_match_all($pattern,$subject,$matches,PREG_PATTERN_ORDER); return$offset==-1?array_pop($matches[0]):$matches[0][$offset];} example:$pattern = "/model(\s|-)[a-z0-9]/i";             $subject = "Is there something wrong with model 654, Model 732, and model 43xl or is Model aj45B the preferred choice?"; echo preg_substr($pattern,$subject); echo preg_substr($pattern,$subject,1); echo preg_substr($pattern,$subject,-1); Returns something like: model 654 Model 732 Model aj45B  Validate PAN Number 4 years ago  Validate PAN Number.[5 Alpha][4 Number][1 Alpha]AAAAA1111Afunction isValidPAN($num){    return preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/",$num);} 
 Testing the speed of preg_match against stripos doing insensitive case search in strings:<?php$string = "Hey, how are you? I'm a string.";// PCRE$start = microtime(true);for ($i = 1;$i < 10000000; $i++) {$bool = preg_match('/you/i', $string);}$end = microtime(true);$pcre_lasted =$end - $start; // 8.3078360557556// Stripos, we believe in you$start = microtime(true);for ($i = 1;$i < 10000000; $i++) {$bool = stripos($string, 'you') !== false;}$end = microtime(true);$stripos_lasted =$end - $start; // 6.0306038856506echo "Preg_match lasted: {$pcre_lasted}<br />Stripos lasted: {$stripos_lasted}";?>So unless you really need to test a string against a regular expression, always use strpos / stripos and other string functions to find characters and strings within other strings.  Steve Todorov 8 years ago  While I was reading the preg_match documentation I didn't found how to match an IP..Let's say you need to make a script that is working with ip/host and you want to show the hostname - not the IP.Well this is the way to go:<?php/* This is an ip that is "GET"/"POST" from somewhere */$ip = $_POST['ipOrHost'];if(preg_match('/(\d+).(\d+).(\d+).(\d+)/',$ip))  $host = gethostbyaddr($ip); else  $host = gethostbyname($ip);echo $host;?>This is a really simple script made for beginners !If you'd like you could add restriction to the numbers. The code above will accept all kind of numbers and we know that IP address could be MAX 255.255.255.255 and the example accepts to 999.999.999.999.Wish you luck!Best wishes,Steve  -1 Iven Marquardt 2 years ago  if you want to match all printable ascii (0..127) expect some specific chars, try this:<?php$excluded = '\$a';echo preg_replace('~[^' .$excluded . '[:^print:]]~', '', 'abc123ABC!?$%/€');?>result: a$€ 
 The following function works well for validating ip addresses <?php function valid_ip($ip) { return preg_match("/^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])" . "(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}$/", $ip); } ?>  -4 saberdream at live dot fr 6 years ago  I made a function to circumvent the problem of length of a string... This verifies that the link is an image. <?php function verifiesimage($lien, $limite) { if( preg_match('#^http:\/\/(.*)\.(gif|png|jpg)$#i', $lien) && strlen($lien) < $limite ) {$msg = TRUE; // link ok     }     else     {         $msg = FALSE; // the link isn't image } return$msg; // return TRUE or FALSE } ?> Example : <?php if(verifierimage(\$votrelien, 50) == TRUE) {     // we display the content... } ?>