add a note add a note

User Contributed Notes 14 notes

up
9
napalm at spiderfish dot net
10 years ago
Pay attention that some pcre features such as once-only or recursive patterns are not implemented in php versions prior to 5.00

Napalm
up
3
theppg_001 at hotmail dot com
8 years ago
Hi there
This was originally made by someone eles but it didn't work correctly and so I remade it and as far as I know it works right.

<?php
/**
* strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )
* ---------------------------------------------------------------------
* Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.
* strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.
* strip_content flag: TRUE will also strip everything between open and closed tag
*/
function strip_selected_tags($str, $tags = "", $stripContent = false)
{
   
preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER);
   
$replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
    foreach (
$allTags[1] as $tag) {
        if (
$stripContent) {
           
$str = preg_replace($replace,'',$str);
        }
           
$str = preg_replace($replace,'${2}',$str);
    }
    return
$str;
}
?>

Before I 'fixed' it, when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You would get back
"this is <p align=\"center\">a test</p> and this is bold"
Why? Because it did not take into account that there could be options etc in the HTML Tag.
My one works perfectly when stripping just the tags or the tag and its contents too!

So now when you run
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You get back
"this is a test and this is bold"
Or when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>",true)
You get back
"this is  and "

Hope it helps someone :)
up
1
Daniel Vandersluis
9 years ago
Concerning note #6 in "Differences From Perl", the \G token *is* supported as the last match position anchor. This has been confirmed to work at least in preg_replace(), though I'd assume it'd work in preg_match_all(), and other functions that can make more than one match, as well.
up
0
sam marshall
7 years ago
For anyone who sees this error:

Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at ...

As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install. A quick look at the PCRE changelog suggests that you probably need at least PCRE 5; we're running 4.5, while the latest is 7.1. You can find out your PCRE version by checking phpinfo().

I suspect this ancient PCRE version is included in some officially-supported Red Hat Enterprise package which is probably why we are running it so might also affect other people.
up
0
onerob at gmail dot com
9 years ago
If, like me, you tend to use the /U pattern modifier, then you will need to remember that using ? or * to to test for optional characters will match zero characters if it means that the rest of the pattern can continue matching, even if the optional characters exist.

For instance, if we have this string:

a___bcde

and apply this pattern:

'/a(_*).*e/U'

The whole pattern is matched but none of the _ characters are placed in the sub-pattern. The way around this (if you still wish to use /U) is to use the ? greediness inverter. eg,

'/a(_*?).*e/U'
up
0
info at atjeff dot co dot nz
9 years ago
ive never used regex expressions till now and had loads of difficulty trying to convert a [url]link here[/url] into an href for use with posting messages on a forum, heres what i manage to come up with:

$patterns = array(
            "/\[link\](.*?)\[\/link\]/",
            "/\[url\](.*?)\[\/url\]/",
            "/\[img\](.*?)\[\/img\]/",
            "/\[b\](.*?)\[\/b\]/",
            "/\[u\](.*?)\[\/u\]/",
            "/\[i\](.*?)\[\/i\]/"
        );
        $replacements = array(
            "<a href=\"\\1\">\\1</a>",
            "<a href=\"\\1\">\\1</a>",
            "<img src=\"\\1\">",
            "<b>\\1</b>",
            "<u>\\1</u>",
            "<i>\\1</i>"
           
        );
        $newText = preg_replace($patterns,$replacements, $text);

at first it would collect ALL the tags into one link/bold/whatever, until i added the "?" i still dont fully understand it... but it works :)
up
0
J Daugherty
10 years ago
In the character class meta-character documentation above, the circumflex (^) is described:

"^   negate the class, but only if the first character"

It should be a little more verbose to fully express the meaning of ^:

^    Negate the character class.  If used, this must be the first character of the class (e.g. "[^012]").
up
-2
roland dot illig at gmx dot de
9 years ago
<quote>
9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.
</quote>

The last sentence does not indicate a bug. If the string "a" should match against the regular expression /^(a)?a/, the last "a" in the regex must be matched by any literal "a" in the string. The rest of the string is "", which obviously does not match the first /^(a)/.
up
-1
pstradomski at gmail dot com
7 years ago
About strip_selected_tags function from two posts below:

it does not work if somebody uses tags without ending ">" character, like this:

<p <b> bold text </b</p

This  is even valid HTML (but not valid XHTML)
up
-1
chris at madblanks dot org
7 years ago
When enclosing your regular expression in double quotes, back references require two backslashes.

For example, \1 is the ascii character \1. You need to provide \\1 to get the back reference.
up
-2
mbrodin
6 years ago
Hi!

For even better prestanda of the code below, use;

<?php
    $f
= array();

    foreach(
$allTags[1] as $tag){
   
$f[] = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
    }

    if(
sizeof($f)) $str = preg_replace($f, ($stripContent ? '' : '${2}'), $str);
?>

This will not use preg_replace on every tag, instead it collect the regex as array, and then executes and should be better.

It also check so there are any regex to replace! If not, it will not start preg_replace! :)

Added the "<?php" so it will highlight the code!
up
-1
Ned Baldessin
9 years ago
Although \w and \W do include as "word characters" locale-specific characters (like "é" if you are using the "fr" locale), \b and \B do not work the same way.

For example :
"foo était bar"   =>   /\W(était)\W/   =>   This captures correctly "était".
"foo était bar"   =>   /\b(était)\b/   =>   This fails to capture it.

This is confusing, because the manual talks in both cases about "word characters", but fails to mention the difference in behaviour.
up
-1
W W W
9 years ago
Back references are a great way to achieve exact matching when it would have been impossible any other way. Take these three strings.

1) "www.www.com"
2) 'www.www.com'
3) "www.www.com'

The regex /^("|').+?("|')$/ would match all three strings but what if you needed the 3rd string above to be illegal because the quotes are not the same? You could write four different regexes to check for every possible case OR you could use back references.

/^("|').+?\1$/ will match strings 1 and 2 but not string 3. Try this code for further proof:

$str_test="'www.www.com\"";
$int_count=preg_match("/^(\"|').+?\\1$/", $str_test, $matches, PREG_OFFSET_CAPTURE);

The preg_match function will not match against $str_test because the quotes are mismatched. If you change $str_test to

$str_test = "'www.www.com'";

the preg_match will work.
up
-3
datacompboy at call2ru dot com
7 years ago
For example, you want to cut an some <div> element.
Accurate, from <div> to correspond </div> element.
Here is proof-of-concept code to do this:

<?
$str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>";

preg_match("#<div.> ( ".
              " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".
           " | (?R) )* </div.>#xi", $str, $m);
var_dump($m[0]);

?>

it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>
To Top