downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | conferences | my php.net

search for in the

stripcslashes> <strcspn
[edit] Last updated: Fri, 17 May 2013

view this page in

strip_tags

(PHP 4, PHP 5)

strip_tags文字列から HTML および PHP タグを取り除く

説明

string strip_tags ( string $str [, string $allowable_tags ] )

この関数は、指定した文字列 (str) から全ての NUL バイトと HTML および PHP タグを取り除きます。 この関数は、 fgetss() 関数と同じタグ除去アルゴリズムを使用します。

パラメータ

str

入力文字列。

allowable_tags

オプションの2番目の引数により、取り除かないタグを指定できます。

注意:

HTML コメントや PHP タグも削除されるようになりました。この機能はハードコードされており、 allowable_tags で変更することはできません。

注意:

このパラメータには空白文字を含めてはいけません。 strip_tags() はタグの大文字小文字を区別せず、 < から始まって最初に空白文字か > があらわれるまでをタグとみなします。 つまり、strip_tags("<br/>", "<br>") は空文字列を返すということです。

返り値

タグを除去した文字列を返します。

変更履歴

バージョン 説明
5.0.0 strip_tags() がバイナリセーフとなりました。
4.3.0 HTML のコメントも除去するようになりました。

例1 strip_tags() の例

<?php
$text 
'<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo 
strip_tags($text);
echo 
"\n";

// <p> と <a> は許可します
echo strip_tags($text'<p><a>');
?>

上の例の出力は以下となります。

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

注意

警告

strip_tags() は HTML の検証を行わないため、 不完全または壊れたタグにより予想以上に多くのテキスト/データが削除される可能性があります。

警告

この関数は、allowable_tags で許可した全てのタグの属性を修正しません。 これには、style および onmouseover属性が含まれており、 悪意のあるユーザーが他のユーザーに見せるようなテキストを投稿する際に危険な行為を行う可能性があります。

注意:

HTML の中にあるタグの中で 1023 バイトより長い名前のものがあれば、 たとえ allowable_tags パラメータに指定していたとしても無効なタグと見なされます。

参考



stripcslashes> <strcspn
[edit] Last updated: Fri, 17 May 2013
 
add a note add a note User Contributed Notes strip_tags - [19 notes]
up
10
CEO at CarPool2Camp dot org
4 years ago
Note the different outputs from different versions of the same tag:

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br>');
var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
?>
up
5
tom at cowin dot us
2 years ago
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...

<?php

   
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
    {
       
mb_regex_encoding('UTF-8');
       
//replace MS special characters first
       
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
       
$replace = array('\'', '\'', '"', '"', '-');
       
$text = preg_replace($search, $replace, $text);
       
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
        //in some MS headers, some html entities are encoded and some aren't
       
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
       
//try to strip out any C style comments first, since these, embedded in html comments, seem to
        //prevent strip_tags from removing html comments (MS Word introduced combination)
       
if(mb_stripos($text, '/*') !== FALSE){
           
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
        }
       
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
        //'<1' becomes '< 1'(note: somewhat application specific)
       
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
       
$text = strip_tags($text, $allowed_tags);
       
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
       
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
       
//strip out inline css and simplify style tags
       
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
       
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
       
$text = preg_replace($search, $replace, $text);
       
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
        //some MS Style Definitions - this last bit gets rid of any leftover comments */
       
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
        if(
$num_matches){
             
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
        }
        return
$text;
    }
?>
up
6
admin at automapit dot com
6 years ago
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
up
4
mariusz.tarnaski at wp dot pl
4 years ago
Hi. I made a function that removes the HTML tags along with their contents:

Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

 
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
 
$tags = array_unique($tags[1]);
   
  if(
is_array($tags) AND count($tags) > 0) {
    if(
$invert == FALSE) {
      return
preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return
preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif(
$invert == FALSE) {
    return
preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return
$text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
 text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
 text with <div>tags</div>

I hope that someone is useful :)
up
4
bzplan at web dot de
7 months ago
a HTML code like this:

<?php
$html
= '
<div>
<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>
<p>material is wood</p>
</div>
'
;
?>

with <?php $str = strip_tags($html); ?>
... the result is:

$str = 'color is bluesize is huge
material is wood';

notice: the words 'blue' and 'size' grow together :(
and line-breaks are still in new string $str

if you need a space between the words (and without line-break)
use my function: <?php $str = rip_tags($html); ?>
... the result is:

$str = 'color is blue size is huge material is wood';

the function:

<?php
// --------------------------------------------------------------

function rip_tags($string) {
   
   
// ----- remove HTML TAGs -----
   
$string = preg_replace ('/<[^>]*>/', ' ', $string);
   
   
// ----- remove control characters -----
   
$string = str_replace("\r", '', $string);    // --- replace with empty space
   
$string = str_replace("\n", ' ', $string);   // --- replace with space
   
$string = str_replace("\t", ' ', $string);   // --- replace with space
   
    // ----- remove multiple spaces -----
   
$string = trim(preg_replace('/ {2,}/', ' ', $string));
   
    return
$string;

}

// --------------------------------------------------------------
?>

the KEY is the regex pattern: '/<[^>]*>/'
instead of strip_tags()
... then remove control characters and multiple spaces
:)
up
1
cesar at nixar dot org
7 years ago
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return
is_array($value) ?
   
array_map('strip_tags_deep', $value) :
   
strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>
up
0
brettz9 AAT yah
4 years ago
Works on shortened <?...?> syntax and thus also will remove XML processing instructions.
up
0
kai at froghh dot de
4 years ago
a function that decides if < is a start of a tag or a lower than / lower than + equal:

<?php
function lt_replace($str){
    return
preg_replace("/<([^[:alpha:]])/", '&lt;\\1', $str);
}
?>

It's to be used before strip_slashes.
up
-3
cyex at hotmail dot com
2 years ago
I thought someone else might find this useful... a simple way to strip BBCode:

<?php

$bbcode_str
= "Here is some [b]bold text[/b] and some [color=#FF0000]red text[/color]!";

$plain_text = strip_tags(str_replace(array('[',']'), array('<','>'), $bbcode_str));

//Outputs: Here is some bold text, and some red text!

?>
up
-2
salavert at~ akelos
7 years ago
<?php
      
/**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
               
$text = str_replace($found[0],$found[1],$text);
          }
        }

        return
$text;
    }

?>

Hope you find it useful,

Jose Salavert
up
-3
jausions at php dot net
6 years ago
To sanitize any user input, you should also consider PEAR's HTML_Safe package.

http://pear.php.net/package/HTML_Safe
up
-2
Sam
3 months ago
Note that js comments are also removed by strip_tags as stripping the tags from the following sample will return an empty string

<script type="text/javascript">
   // Add custom parameters here.
</script>
up
-2
hongong at webafrica dot org dot za
4 years ago
An easy way to clean a string of all CDATA encapsulation.

<?php
function strip_cdata($string)
{
   
preg_match_all('/<!\[cdata\[(.*?)\]\]>/is', $string, $matches);
    return
str_replace($matches[0], $matches[1], $string);
}
?>

Example: echo strip_cdata('<![CDATA[Text]]>');
Returns: Text
up
-2
southsentry at yahoo dot com
4 years ago
I was looking for a simple way to ban html from review posts, and the like. I have seen a few classes to do it. This line, while it doesn't strip the post, effectively blocks people from posting html in review and other forms.

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
    return
false;
}
?>

If you want to further get by the tricksters that use & for html links, include this:

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
        return
false;
} elseif (
strpos($review, "&") !== false) {
        return
5;
}
?>

I hope this helps someone out!
up
-2
chrisj at thecyberpunk dot com
11 years ago
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:

$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);
up
-3
Liam Morland
4 years ago
Here is a suggestion for getting rid of attributes: After you run your HTML through strip_tags(), use the DOM interface to parse the HTML. Recursively walk through the DOM tree and remove any unwanted attributes. Serialize the DOM back to the HTML string.

Don't make the default permit mistake: Make a list of the attributes you want to ALLOW and remove any others, rather than removing a specific list, which may be missing something important.
up
-3
Anonymous User
8 years ago
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.

For example,

"<strong>This is a bit of text</strong><p />Followed by this bit"

are seperable paragraphs on a visual plane, but if simply stripped of tags will result in

"This is a bit of textFollowed by this bit"

which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.

The workaround is to force whitespace prior to stripping, using something like this:

<?php
      $text
= getTheText();
     
$text = preg_replace('/</',' <',$text);
     
$text = preg_replace('/>/','> ',$text);
     
$desc = html_entity_decode(strip_tags($text));
     
$desc = preg_replace('/[\n\r\t]/',' ',$desc);
     
$desc = preg_replace('/  /',' ',$desc);
?>
up
-5
Abdul Al-hasany
2 years ago
As noted in the documentation strip_tags would strip php and comments tags even if they are add to $allowable_tags.

Here is a little workaround for this issue:
<?php
function stripTags($text, $tags)
{
 
 
// replace php and comments tags so they do not get stripped 
 
$text = preg_replace("@<\?@", "#?#", $text);
 
$text = preg_replace("@<!--@", "#!--#", $text);
 
 
// strip tags normally
 
$text = strip_tags($text, $tags);
 
 
// return php and comments tags to their origial form
 
$text = preg_replace("@#\?#@", "<?", $text);
 
$text = preg_replace("@#!--#@", "<!--", $text);
 
  return
$text;
}
?>

The function would replace the tags to hashes so strip_tags would not identify them as normal tags, and then when strip_tags does its job the tags are modified back to their original form.
up
-9
Kalle Sommer Nielsen
5 years ago
This adds alot of missing javascript events on the strip_tags_attributes() function from below entries.

Props to MSDN for lots of them ;)

<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>

 
show source | credits | stats | sitemap | contact | advertising | mirror sites