Danno, your script has a flaw.
Try this :
<?php
function strip_tags_keep_links($sSource)
{
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/\b((?![hH][rR][eE][fF]\b)\w+)[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource,'<a>'));
}
$source = "<a href=javascript:alert('doesn\'t work!') title=\"move your mouse here\" href=http://www.a_web_site.org onmouseover\n=\nalert(\"doesn\'t work!\") onmouseover='alert(\"doesn\'t work!\")' alt=\"move your mouse here\" > test</a>";
$result=strip_tags_keep_links($source);
echo($result);
?>
strip_tags
Massoud Abbagash
07-May-2008 03:56
07-May-2008 03:56
Massoud Abbagash
07-May-2008 03:26
07-May-2008 03:26
There is still a flaw in your function.
Look at this, the [onmouseover] sample script below remains. Even after the treatment with the function [strip_tags_attributes].
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
$source="<big onmouseover=alert('Hello!')>Move your mouse here (this doesn't work with [ strip_tags_attributes ])</big>";
$striped_source=strip_tags_attributes($source,array('<big>'));
echo($striped_source);
?>
Now,this is my correction:
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/\s(' . implode('|', $aDisabledAttributes) . ').*?([\s\>])/', '\\2', preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags))) );
}
$source="<big onmouseover=alert('Hello!')>Move your mouse here (this work with [ strip_tags_attributes corrected ])</big>";
$striped_source=strip_tags_attributes($source,array('<big>'));
echo($striped_source);
?>
bluej100@gmail
02-May-2008 12:31
02-May-2008 12:31
Allowing user HTML while preventing XSS is non-trivial. Don't just try to hack together a regexp for it; at very least, check your solution against all of the ha.ckers.org exploit examples:
http://ha.ckers.org/xss.html
Really, though, you should be using a solid library that recognizes tags, attributes, and styles from a whitelist and rebuilds the markup from scratch. HTMLPurifier has a "linkify" option that does what you're looking for.
http://htmlpurifier.org
LK
19-Apr-2008 04:30
19-Apr-2008 04:30
Concerning all of the notes about which attributes to include in strip_tags_attributes(), the latest of which is by Kalle Sommer Nielsen:
Correct me if I'm wrong, but isn't it a lot easier to simply reject any attribute that starts with "on"? Thus, the whole array of various javascript attributes could be replaced with "on\w+".
I am not aware of any non-javascript attributes that start with these two letters, and if there were, it would be easier to make an exception for them than for the countless JS attributes.
Danno
08-Apr-2008 01:20
08-Apr-2008 01:20
Hi everyone,
I came across this thread looking for a way to strip out all tags but links and leaving only the HREF attribute. I took what you guys have worked on and made it allow only the HREF attribute. This way even if the spec changes you are sure to not let any javascript sneak in, who knows what the future will bring :P . So I think its pretty tight, take a look at it and modify if you see any holes.
<?php
function strip_tags_keep_links($sSource)
{
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/\b((?![hH][rR][eE][fF]\b)\w+)[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource,'<a>'));
}
?>
Kalle Sommer Nielsen
30-Mar-2008 03:05
30-Mar-2008 03:05
This adds alot of missing javascript events on the strip_tags_attributes() function from below entries.
Props to MSDN for lots of them ;)
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
sych
13-Mar-2008 08:26
13-Mar-2008 08:26
brian, this solution is not good, because there are events that you will forget any way. Like, with this code you are vulnerable to attr "onMouseEnter" and tons of others that actually exist in javascript specs.
brian at diamondsea dot com
03-Mar-2008 11:47
03-Mar-2008 11:47
An update agolna's update to sbritton's function:
Adds additional javascript events to the aDisabledAttributes array.
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onblue', 'onchange', 'onclick', 'ondblclick', 'onerror', 'onfocus', 'onkeydown', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseover', 'onmouseup', 'onreset', 'onresize', 'onselect', 'onsubmit', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
agolna at gmail dot com
28-Feb-2008 07:37
28-Feb-2008 07:37
An update to sbritton's function:
If you have whitespace between the = sign and the attribute, it would bypass the regex. This updates that.
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
ZlobnyNigga
21-Feb-2008 08:22
21-Feb-2008 08:22
sbritton's function is not so good...
<?php
$str = "<p onmouseover = 'alert(1);'>123</p>";
echo strip_tags_attributes($str);
?>
sbritton
04-Feb-2008 10:35
04-Feb-2008 10:35
The function below corrects a typo in y5's function to strip tags and attributes - it also adds lithium1330's recommended 's' parameter:
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
lithium1330[(at)]msn.com
25-Jan-2008 04:02
25-Jan-2008 04:02
Please note: in the code given by y5, Tony Freeman, tREXX [www.trexx.ch] and maybe others, you need to use the modifier "s" at the end of the preg_replace()'s regex (/ies) in order to strip attributes that have a line break before them, otherwise those attributes wont be stripped.
bstrick at gmail dot com
15-Jan-2008 09:52
15-Jan-2008 09:52
This will strip all PHP and HTML out of a file. Leaves only plain txt.
// Open the search file
$file = fopen($filename, 'r');
// Get rid of all PHP code.
$search = array('/<\?((?!\?>).)*\?>/s');
$text = fread($file, filesize($filename));
$new = strip_tags(preg_replace($search, '', $text));
echo $new;
fclose($file);
- Strick
y5
15-Jan-2008 08:59
15-Jan-2008 08:59
An improved version of tREXX and Tony Freeman's code, this keeps the code clean while removing unwanted attributes, including the javascript: protocol. Unlike the built-in strip_tags() function, this takes an array for allowed tags, rather than a string. For example: array('<a>', '<object>');
I don't understand why the built-in function uses a string.. oh well =)
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
{
if (empty($aDisabledEvents)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
Enzo_01 at abv dot bg
08-Jan-2008 12:42
08-Jan-2008 12:42
This is simple function to strib BBtags, it`s work good for me. :)
<?php
function bb_strip($s) {
return ereg_replace("\[/?[^] ]*/?\]",'',$s);
}
?>
blackjackdevel at gmail dot com
31-Oct-2007 08:06
31-Oct-2007 08:06
i slightly modified the function of mrmaxxx333 it wouldn't function with href with single cotes , i also removed or modifyed some syntax,
but i tested here and it works i had to jump a line so just glue it :
$String="<a href='blah.com'>welcome to blah</a>";
$msgStrip = preg_replace('/<a\s+.*?[href=]["|\']([^"\']+)["|\']>
{1}([^<]+)<\/a>/is', '\2 (\1)',$String);
it will output welcome to blah (blah.com)
Matthieu Larcher
27-Jun-2007 08:44
27-Jun-2007 08:44
I noticed some problems with the strip_selected_tags() function below, sometimes big chunks of contents where suppressed...
Here is a modified version that should run better.
<?php
function strip_selected_tags($text, $tags = array())
{
$args = func_get_args();
$text = array_shift($args);
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
while(preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){
$text = str_replace($found[0],$found[2],$text);
}
}
return preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text);
}
?>
birwin at suddensales dot com
23-Jun-2007 12:18
23-Jun-2007 12:18
This is an upgrade to the illegal characters script by robt. This script will handle the input, even if the one or all of the fileds include arrays. Of course another loop could be added to handle compound arrays within arrays, but if you are savvy enough to be using compound arrays, you don't need me to rewrite the program.
<?
function screenForm($ary_check_for_html)
{
// check array - reject if any content contains HTML.
foreach($ary_check_for_html as $field_value)
{
if(is_array($field_value))
{
foreach($field_value as $field_array) // if the field value is an array, step through it
{
$stripped = strip_tags($field_array);
if($field_array!=$stripped)
{
// something in the field value was HTML
return false;
}
}
}else{
$stripped = strip_tags($field_value);
if($field_value!=$stripped)
{
// something in the field value was HTML
return false;
}
}
}
return true;
}
?>
geersc at hotmail dot com
12-May-2007 03:13
12-May-2007 03:13
Hi,
I made the following adjustments to the "stripeentag()" function listed here.
Improvements are always welcome.
Regards,
Chris
<?php
function strip_attributes($msg, $tag, $attr, $suffix = "")
{
$lengthfirst = 0;
while (strstr(substr($msg, $lengthfirst), "<$tag ") != "")
{
$tag_start = $lengthfirst + strpos(substr($msg, $lengthfirst), "<$tag ");
$partafterwith = substr($msg, $tag_start);
$img = substr($partafterwith, 0, strpos($partafterwith, ">") + 1);
$img = str_replace(" =", "=", $img);
$out = "<$tag";
for($i=0; $i < count($attr); $i++)
{
if (empty($attr[$i])) {
continue;
}
$long_val =
(strpos($img, " ", strpos($img, $attr[$i] . "=")) === FALSE) ?
strpos($img, ">", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1) :
strpos($img, " ", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1);
$val = substr($img, strpos($img, $attr[$i] . "=" ) + strlen($attr[$i]) + 1, $long_val);
if (!empty($val)) {
$out .= " " . $attr[$i] . "=" . $val;
}
}
if (!empty($suffix)) {
$out .= " " . $suffix;
}
$out .= ">";
$partafter = substr($partafterwith, strpos($partafterwith,">") + 1);
$msg = substr($msg, 0, $tag_start). $out. $partafter;
$lengthfirst = $tag_start + 3;
}
return $msg;
}
?>
lucky760 at yahoo dot com
22-Feb-2007 08:52
22-Feb-2007 08:52
I needed a way to allow user comments to contain only hyperlinks as the only allowed HTML tags. This is easy enough to accomplish, but I also needed a way to convert full URLs into hyperlinks, and this complicated things a bit.
The functions below are not very elegant, but do the job. Function strip_tags_except() works similarly to the strip_selected_tags() function defined a few times on this page, but instead of allowing the user to specify the tags to strip, she can specify the tags to allow and strip all others. The third parameter, $strip, when TRUE removes "<" and ">" from the string and when FALSE converts them to "<" and ">" respectively.
Function url_to_link() simply converts full URLs into an equivalent hyperlink taking into consideration that users may end a URL with a character that's not actually part of the address.
When using both, url_to_link() should be called before strip_tags_except(). Here's an example as we are using it on http://www.VideoSift.com:
<?php
$summary = url_to_link($summary);
$summary = strip_tags_except($summary, array('a'), FALSE);
?>
Here are the function definitions:
<?php
function strip_tags_except($text, $allowed_tags, $strip=TRUE) {
if (!is_array($allowed_tags))
return $text;
if (!count($allowed_tags))
return $text;
$open = $strip ? '' : '<';
$close = $strip ? '' : '>';
preg_match_all('!<\s*(/)?\s*([a-zA-Z]+)[^>]*>!',
$text, $all_tags);
array_shift($all_tags);
$slashes = $all_tags[0];
$all_tags = $all_tags[1];
foreach ($all_tags as $i => $tag) {
if (in_array($tag, $allowed_tags))
continue;
$text =
preg_replace('!<(\s*' . $slashes[$i] . '\s*' .
$tag . '[^>]*)>!', $open . '$1' . $close,
$text);
}
return $text;
}
function url_to_link($text) {
$text =
preg_replace('!(^|([^\'"]\s*))' .
'([hf][tps]{2,4}:\/\/[^\s<>"\'()]{4,})!mi',
'$2<a href="$3">$3</a>', $text);
$text =
preg_replace('!<a href="([^"]+)[\.:,\]]">!',
'<a href="$1">', $text);
$text = preg_replace('!([\.:,\]])</a>!', '</a>$1',
$text);
return $text;
}
?>
rodt
16-Jan-2007 07:46
16-Jan-2007 07:46
I have used this function successfully to prevent bots inserting HTML to web forms. Put the fields' contents into an array, then feed array to this function as an argument. Returns false if HTML is included; true if there is no HTML in any of the array's values. Hope it's helpful to someone.
/*
Checks that there is no HTML in any of provided fields.
$ary_no_html_allowed = Array to check for HTML content.
*/
function screenForm($ary_check_for_html){
// check array - reject if any content contains HTML.
foreach($ary_check_for_html as $field_value) {
$stripped = strip_tags($field_value);
if($field_value!=$stripped) { // something in the field value was HTML
return false;
}
}
return true;
}
}
uersoy at tnn dot net
26-Dec-2006 05:18
26-Dec-2006 05:18
admin at automapit dot com's function is great. Cleans everything I don't need :). But there is a small problem; strip style tags line should be before strip html tags line. Otherwise, strip html tags section cleans the <style></style> and between them is stays there as text.
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
?>
bermi ferrer
27-Nov-2006 01:40
27-Nov-2006 01:40
Here is a faster and tested version of strip_selected_tags.
Previous example had a small bug that has been fixed now.
<?php
function strip_selected_tags($text, $tags = array())
{
$args = func_get_args();
$text = array_shift($args);
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
if( preg_match_all( '/<'.$tag.'[^>]*>([^<]*)<\/'.$tag.'>/iu', $text, $found) ){
$text = str_replace($found[0],$found[1],$text);
}
}
return preg_replace( '/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text);
}
?>
bermi ferrer at (google it yourself :P )
24-Nov-2006 09:08
24-Nov-2006 09:08
This is Salaverts function improved with suggestions from this page as it has been refactored forthe Akelos Framework (http://www.akelos.org) by Jose Salavert
Please note that the "u" modifier need to be lowercased. This function will also replace self-closing tags (XHTML <br /> <hr />) and will work if the text contains line breaks.
<?php
function strip_selected_tags($text, $tags = array())
{
$args = func_get_args();
$text = array_shift($args);
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
if(preg_match_all('/<'.$tag.'[^>]*>((\\n|\\r|.)*)<\/'. $tag .'>/iu', $text, $found)){
$text = str_replace($found[0],$found[1],$text);
}
}
return preg_replace('/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text);
}
?>
computer at dharma dot org
13-Nov-2006 09:08
13-Nov-2006 09:08
Thanks for the strip_selected_tags code Jose. :-)
Peace,
Charlie
David
05-Nov-2006 11:29
05-Nov-2006 11:29
<?php
/**
* strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )
* ---------------------------------------------------------------------
* Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.
* strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.
* strip_content flag: TRUE will also strip everything between open and closed tag
*/
public function strip_selected_tags($str, $tags = "", $stripContent = false)
{
preg_match_all("/<([^>]+)>/i",$tags,$allTags,PREG_PATTERN_ORDER);
foreach ($allTags[1] as $tag){
if ($stripContent) {
$str = preg_replace("/<".$tag."[^>]*>.*<\/".$tag.">/iU","",$str);
}
$str = preg_replace("/<\/?".$tag."[^>]*>/iU","",$str);
}
return $str;
}
?>
anonymous
01-Nov-2006 04:52
01-Nov-2006 04:52
A different approach to cleaning up HTML would be to first escape all unsafe characters:
& to &
< to <
> to >
then to unescape matching pairs of tags back (e.g. "<b>hello</b>" => "<b>hello</b>"), if it is identified safe.
This backwards-approach should be safer because if a tag is not identified correctly, it is, at the end, in an escaped state.
So if a user enters invalid html, or tags that are unsupported or unwanted, they are shown in plain text, and not stripped away. This is good, because the characters "<" and ">" might have been used in a different way (e.g. to make a text arrow: "a <=> b").
This is the case in most forums (apart from the fact that they use "[tag]"-tags instead of "<tag>"-tags)
pierresyraud at hotmail dot com
05-Oct-2006 12:43
05-Oct-2006 12:43
A function inverse of, for strip any text and keep html tags !!!
function strip_text($a){
$i=-1;$n='';$ok=1;
while(isset($a{++$i})){
if($ok&&$a{$i}!='<'){continue;}
elseif($a{$i}=='>'){$ok=1;$n.='>';continue;}
elseif($a{$i}=='<'){$ok=0;}
if(!$ok){$n.=$a{$i};}}
return $n;}
magdolen at elepha dot info
01-Oct-2006 07:24
01-Oct-2006 07:24
i edited strip_selected_tags function that salavert created to strip also single tags (xhtml only)
here it is also with metric modification:
function strip_selected_tags($text, $tags = array()) {
$args = func_get_args();
// metric edit
$text = preg_replace("/\r\n|\n|\r/","",array_shift($args));
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
$text = str_replace($found[0],$found[1],$text);
}
// hrax edit
if(preg_match_all('/<'.$tag.'.*\/>/iU', $text, $found)){
$text = str_replace($found[0], "", $text);
}
}
return $text;
}
jausions at php dot net
18-Sep-2006 11:57
18-Sep-2006 11:57
To sanitize any user input, you should also consider PEAR's HTML_Safe package.
http://pear.php.net/package/HTML_Safe
bfmaster_duran at yahoo dot com dot br
14-Sep-2006 06:32
14-Sep-2006 06:32
I made this function with regular expression to remove some style properties from tags based in other exaples here ;D
<?
function removeAttributes($htmlText)
{
$stripAttrib = "'\\s(class)=\"(.*?)\"'i"; //remove classes from html tags;
$htmlText = stripslashes($htmlText);
$htmlText = preg_replace($stripAttrib, '', $htmlText);
$stripAttrib = "/(font\-size|color|font\-family|line\-height):\\s".
"(\\d+(\\x2E\\d+\\w+|\\W)|\\w+)(;|)(\\s|)/i";
//remove font-style,color,font-family,line-height from style tags in the text;
$htmlText = stripslashes($tagSource);
$htmlText = preg_replace($stripAttrib, '', $htmlText);
$htmlText = str_replace(" style=\"\"", '', $htmlText); //remove empty style tags, after the preg_replace above (style="");
return $htmlText;
}
function removeEvilTags($source)
{
return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);
}
?>
Usage:
<?
$text = '<p style="line-height: 150%; font-weight: bold" class="MsoNormal"><span style="font-size: 10.5pt; line-height: 150%; font-family: Verdana">Com o compromisso de pioneirismo e aprimoramento, características da Oftalmoclínica, novos equipamentos foram adquiridos para exames e diagnósticos ainda mais precisos:</span></p>'; //This text is in brazillian portuguese ;D
echo htmlentities(removeEvilTags($text))."\r\n";
//This is return: <p style="font-weight: bold"><span>Com o compromisso de pioneirismo e aprimoramento, características da Oftalmoclínica, novos equipamentos foram adquiridos para exames e diagnósticos ainda mais precisos:</span></p>
?>
W0oT ! This is fantastic !
If you find an error, please report me to my mail ;D
(Y)
metric at 152 dot org
10-Aug-2006 11:46
10-Aug-2006 11:46
I tried using the strip_selected_tags function that salavert created. It works really well for one line text, but if you have hard returns in the text it can't find the other tag.
I altered the line where it shifts the text into a variable to replace on OS line returns.
$text = preg_replace("/\r\n|\n|\r/","",array_shift($args));
admin at automapit dot com
09-Aug-2006 10:01
09-Aug-2006 10:01
<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
?>
This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.
It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
09-Aug-2006 02:08
<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<[\\/\\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\\s\\S]*?--[ \\t\\n\\r]*>@' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
?>
This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.
It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
elgios at gmail dot com
05-Aug-2006 12:33
05-Aug-2006 12:33
I think that the new function works, but don't remove PHP tags, only html!!
<?php
function theRealStripTags2($string)
{
$tam=strlen($string);
// tam have number of cars the string
$newstring="";
// newstring will be returned
$tag=0;
/* if tag = 0 => copy car from string to newstring
if tag > 0 => don't copy. Found one or more '<' and need
to search '>'. If we found 3 '<' need to find all the 3 '>'
*/
/* I am C programmer. walk in a string is natural for me and more efficient
*/
for ($i=0; $i < $tam; $i++){
// If I found one '<', $tag++ and continue whithout copy
if ($string{$i} == '<'){
$tag++;
continue;
}
// if I found '>', decrease $tag and continue
if ($string{$i} == '>'){
if ($tag){
$tag--;
}
/* $tag never be negative. If string is "<b>test</b>>"
(error, of course) $tag will stop in 0
*/
continue;
}
// if $tag is 0, can copy
if ($tag == 0){
$newstring .= $string{$i}; // simple copy, only one car
}
}
return $newstring;
}
echo theRealStripTags2("<tag>test</tag>");
// return "test"
?>
elgios at gmail dot com
04-Aug-2006 08:24
04-Aug-2006 08:24
I think that new function works.
function theRealStripTags2($string)
{
$tam=strlen($string);
// tam have number of cars the string
$newstring="";
// newstring will be returned
$tag=0;
/* tag = 0 => copy car from string to newstring
tag > 0 => don't copy. Find one or mor tag '<' and
need to find '>'. If we find 3 '<' need to find
all 3 '>'
*/
/* I am C programm. seek in a string is natural for me
and more efficient
Problem: copy a string to another string is more
efficient but use more memory!!!
*/
for ($i=0; $i < $tam; $i++){
/* If I find one '<', $tag++ and continue whithout copy*/
if ($string{$i} == '<'){
$tag++;
continue;
}
/* if I find '>', decrease $tag and continue */
if ($string{$i} == '>'){
if ($tag){
$tag--;
}
/* $tag never be negative. If string is "<b>test</b>>" (error, of course)
$tag stop in 0
*/
continue;
}
/* if $tag is 0, can copy */
if ($tag == 0){
$newstring .= $string{$i}; // simple copy, only car
}
}
return $newstring;
}
Sébastien
23-May-2006 08:22
23-May-2006 08:22
hum, it seems that your function "theRealStripTags" won't have the right behavior in some cases, for example:
<?php
theRealStripTags("<!-- I want to put a <div>tag</div> -->");
theRealStripTags("<!-- Or a carrot > -->");
theRealStripTags("<![CDATA[what about this! It's to protect from HTML characters like <tag>, > and so on in XML, no?]]> -->");
?>
xyexz at yahoo dot com
09-May-2006 08:41
<09-May-2006 08:41
I have found with this function that sometimes it will only remove the first carrot from a tag and leave the rest of the tag in the string, which obviously isn't what I'm looking for.
EX:
<?php
//Returns "tag>test/tag>"
echo strip_tags("<tag>test</tag>");
?>
I'm trying to strip_tags on a string I'm importing from xml so perhaps it has something to do with that but if you've run into this same issue I've written a function to fix it once and for all!
<?php
function theRealStripTags($string)
{
//while there are tags left to remove
while(strstr($string, '>'))
{
//find position of first carrot
$currentBeg = strpos($string, '<');
//find position of end carrot
$currentEnd = strpos($string, '>');
//find out if there is string before first carrot
//if so save it in $tmpstring
$tmpStringBeg = @substr($string, 0, $currentBeg);
//find out if there is string after last carrot
//if so save it in $tmpStringEnd
$tmpStringEnd = @substr($string, $currentEnd + 1, strlen($string));
//cut the tag from the string
$string = $tmpStringBeg.$tmpStringEnd;
}
return $string;
}
//Returns "test"
echo theRealStripTags('<tag>test</tag>');
?>
