strtok

(PHP 4, PHP 5, PHP 7, PHP 8)

strtok — 标记分割字符串

说明

function strtok(string $string, string $token): string|false

可选 token （不支持命名参数）：

function strtok(string $token): string|false

strtok() 将字符串 string 分割为若干子字符串，每个子字符串以 token 中的字符分割。这也就意味着，如果有个字符串是 "This is an example string"，你可以使用空格字符作为 token，将这句话分割成独立的单词。

注意，仅第一次调用 strtok 函数时才使用 string 参数。随后每次调用 strtok 都将只使用 token 参数，因为它会记住它在当前字符串中的位置。如果要重新开始分割一个新的字符串，你需要再次使用 string 参数来调用 strtok 函数来对其进行初始化。注意，可以在 token 参数中使用多个字符。字符串将被 token 参数中任何一个字符分割。

注意:
这个函数的行为可能与熟悉 explode() 的人期望略有不同。首先，在解析的字符串中，两个或多个连续的 token 的字符被认为是单一的分隔符。此外，位于字符串开始或结束处的 token 将被忽略。例如，如果使用一个字符串 ";aaa;;bbb;" ，连续调用 strtok() 并将 ";" 作为一个 token 将返回字符串 "aaa" 和 "bbb"，然后返回 false 。因此，字符串将仅被拆分为两个元素，而 explode(";", $string) 将返回一个包含 5 个元素的数组。

参数

string: 被分成若干子字符串的原始字符串。
token: 分割 string 时使用的分界字符。

返回值

标记后的字符串，如果没有更多标记可用，则返回 false 。

更新日志

版本	说明
8.3.0	现在，当未提供 `token` 时，会抛出 `E_WARNING`。

示例

示例 #1 strtok() 示例

<?php
$string = "This is\tan example\nstring";
/* 使用制表符和换行符作为分界符 */
$tok = strtok($string, " \n\t");

while ($tok !== false) {
    echo "Word={$tok}\n";
    $tok = strtok(" \n\t");
}
?>

示例 #2 当 strtok() 找不到标记时的反应

<?php
$first_token  = strtok('/something', '/');
$second_token = strtok('/');
var_dump($first_token, $second_token);
?>

以上示例会输出：

string(9) "something"
    bool(false)

示例 #3 strtok() 和 explode() 的不同点

<?php
$string = ";aaa;;bbb;";
$parts = [];
$tok = strtok($string, ";");
while ($tok !== false) {
    $parts[] = $tok;
    $tok = strtok(";");
}
echo json_encode($parts),"\n";
$parts = explode(";", $string);
echo json_encode($parts),"\n";

以上示例会输出：

["aaa","bbb"]
["","aaa","","bbb",""]

注释

警告

此函数可能返回布尔值 false，但也可能返回等同于 false 的非布尔值。请阅读布尔类型章节以获取更多信息。应使用 === 运算符来测试此函数的返回值。

参见

explode() - 使用一个字符串分割另一个字符串

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 20 notes

down

eep2004 at ukr dot net ¶

12 years ago

<?php
// strtok example
$str = 'Hello to all of Ukraine';
echo strtok($str, ' ').' '.strtok(' ').' '.strtok(' ');
?>
Result:
Hello to all

down

elarlang at gmail dot com ¶

15 years ago

If you have memory-usage critical solution, you should keep in mind, that strtok function holds input string parameter (or reference to it?) in memory after usage.

<?php
function tokenize($str, $token_symbols) {
    $word = strtok($str, $token_symbols);
    while (false !== $word) {
        // do something here...
        $word = strtok($token_symbols);
    }
}
?>
Test-cases with handling ~10MB plain-text file:
Case #1 - unset $str variable
<?php
$token_symbols = " \t\n";
$str = file_get_contents('10MB.txt'); // mem usage 9.75383758545 MB (memory_get_usage() / 1024 / 1024)); 
tokenize($str, $token_symbols); // mem usage 9.75400161743 MB
unset($str); // 9.75395584106 MB
?>
Case #1 result: memory is still used

Case #2 - call strtok again
<?php
$token_symbols = " \t\n";
$str = file_get_contents('10MB.txt'); // 9.75401306152 MB
tokenize($str, $token_symbols); // 9.75417709351
strtok('', ''); // 9.75421524048
?>
Case #2 result: memory is still used

Case #3 - call strtok again AND unset $str variable
<?php
$token_symbols = " \t\n";
$str = file_get_contents('10MB.txt'); // 9.75410079956 MB
tokenize($str, $token_symbols); // 9.75426483154 MB
unset($str);
strtok('', ''); // 0.0543975830078 MB
?>
Case #3 result: memory is free

So, better solution for tokenize function:
<?php
function tokenize($str, $token_symbols, $token_reset = true) {
    $word = strtok($str, $token_symbols);
    while (false !== $word) {
        // do something here...
        $word = strtok($token_symbols);
    }

    if($token_reset)
        strtok('', '');
}
?>

down

manicdepressive at mindless dot com ¶

22 years ago

<pre><?php
/** get leading, trailing, and embedded separator tokens that were 'skipped'
if for some ungodly reason you are using php to implement a simple parser that 
needs to detect nested clauses as it builds a parse tree */

$str = "(((alpha(beta))(gamma))";

$seps = '()';
$tok = strtok( $str,$seps ); // return false on empty string or null
$cur = 0;      
$dumbDone = FALSE;
$done = (FALSE===$tok);
while (!$done) {
   // process skipped tokens (if any at first iteration) (special for last)
   $posTok = $dumbDone ? strlen($str) : strpos($str, $tok, $cur );
   $skippedMany = substr( $str, $cur, $posTok-$cur ); // false when 0 width
   $lenSkipped = strlen($skippedMany); // 0 when false
   if (0!==$lenSkipped) {
      $last = strlen($skippedMany) -1;
      for($i=0; $i<=$last; $i++){
         $skipped = $skippedMany[$i];
         $cur += strlen($skipped);
         echo "skipped: $skipped\n";
      }
   }
   if ($dumbDone) break; // this is the only place the loop is terminated

   // process current tok
   echo "curr tok: ".$tok."\n";

   // update cursor
   $cur += strlen($tok);

   // get any next tok
   if (!$dumbDone){
      $tok = strtok($seps);
      $dumbDone = (FALSE===$tok); 
      // you're not really done till you check for trailing skipped
   }
};
?></pre>

down

eep2004 at ukr dot net ¶

11 years ago

Remove GET variables from the URL
<?php
echo strtok('http://example.com/index.php?foo=1&bar=2', '?');
?>
Result:
http://example.com/index.php

down

benighted at gmail dot com ¶

16 years ago

Simple way to tokenize search parameters, including double or single quoted keys.  If only one quote is found, the rest of the string is assumed to be part of that token.

<?php
            $token = strtok($keywords,' ');
            while ($token) {
                // find double quoted tokens
                if ($token{0}=='"') { $token .= ' '.strtok('"').'"'; }
                // find single quoted tokens
                if ($token{0}=="'") { $token .= ' '.strtok("'")."'"; }

                $tokens[] = $token;
                $token = strtok(' ');
            }
?>

Use substr(1,strlen($token)) and remove the part that adds the trailing quotes if you want your output without quotes.

down

info at maisuma dot jp ¶

12 years ago

If you want to tokenize by only one letter, explode() is much faster compared to strtok().

<?php
$str=str_repeat('foo ',10000);

//explode()
$time=microtime(TRUE);
$arr=explode($str,' ');
$time=microtime(TRUE)-$time;
echo "explode():$time sec.".PHP_EOL;

//strtok()
$time=microtime(TRUE);
$ret=strtok(' ',$str);
while($ret!==FALSE){
    $ret=strtok(' ');
}
$time=microtime(TRUE)-$time;
echo "strtok():$time sec.".PHP_EOL;

?>

The result is : (PHP 5.3.3 on CentOS)

explode():0.001317024230957 sec.
strtok():0.0058917999267578 sec.

explode() is about five times fast in short strings, too.

down

Logikos ¶

17 years ago

This looks very simple, but it took me a long time to figure out so I thought I'd share it incase someone else was wanting the same thing:

this should work similar to substr() but with tokens instead!

<?php
/* subtok(string,chr,pos,len)
 *
 * chr = chr used to seperate tokens
 * pos = starting postion
 * len = length, if negative count back from right
 * 
 *  subtok('a.b.c.d.e','.',0)     = 'a.b.c.d.e'
 *  subtok('a.b.c.d.e','.',0,2)   = 'a.b'
 *  subtok('a.b.c.d.e','.',2,1)   = 'c'
 *  subtok('a.b.c.d.e','.',2,-1)  = 'c.d'
 *  subtok('a.b.c.d.e','.',-4)    = 'b.c.d.e'
 *  subtok('a.b.c.d.e','.',-4,2)  = 'b.c'
 *  subtok('a.b.c.d.e','.',-4,-1) = 'b.c.d'
 */
function subtok($string,$chr,$pos,$len = NULL) {
  return implode($chr,array_slice(explode($chr,$string),$pos,$len));
}
?>

explode breaks the tokens up into an array, array slice alows you to pick then tokens you want, and then implode converts it back to a string

although its far from a clone, this was inspired by mIRC's gettok() function

down

Axeia ¶

12 years ago

Might be pointing out the obvious but if you'd rather use a for loop rather than a while (to keep the token strings on the same line for readability for example), it can be done. Added bonus, it doesn't put a $tok variable outside the loop itself either.
Downside however is that you're not able to manually free up the memory used using the technique mentioned by elarlang.

<?php
for($tok = strtok($str, ' _-.'); $tok!==false; $tok = strtok(' _-.'))
{
  echo "$tok </br>";
}
?>

down

gilthans at NOSPAM dot gmail dot com ¶

14 years ago

Note that strtok may receive different tokens each time. Therefore, if, for example, you wish to extract several words and then the rest of the sentence:

<?php
$text = "13 202 5 This is a long message explaining the error codes.";
$error1 = strtok($text, " "); //13
$error2 = strtok(" "); //202
$error3 = strtok(" "); //5
$error_message = strtok(""); //Notice the different token parameter
echo $error_message; //This is a long message explaining the error codes.
?>

down

KrazyBox ¶

17 years ago

As of the change in strtok()'s handling of empty strings, it is now useless for scripts that rely on empty data to function.

Take for instance, a standard header. (with UNIX newlines)

http/1.0 200 OK\n
Content-Type: text/html\n
\n
--HTML BODY HERE---

When parsing this with strtok, one would wait until it found an empty string to signal the end of the header. However, because strtok now skips empty segments, it is impossible to know when the header has ended.
This should not be called `correct' behavior, it certainly is not. It has rendered strtok incapable of (properly) processing a very simple standard.

This new functionality, however, does not affect Windows style headers. You would search for a line that only contains "\r"
This, however, is not a justification for the change.

down

azeem ¶

16 years ago

Here is a java like StringTokenizer class using strtok function:

<?php

/**
 * The string tokenizer class allows an application to break a string into tokens.
 *
 * @example The following is one example of the use of the tokenizer. The code:
 * <code>
 * <?php
 *    $str = 'this is:@\t\n a test!';
 *    $delim = ' !@:'\t\n; // remove these chars
 *    $st = new StringTokenizer($str, $delim);
 *    while ($st->hasMoreTokens()) {
 *        echo $st->nextToken() . "\n";
 *    }
 *    prints the following output:
 *      this
 *      is
 *      a
 *      test
 * ?>
 * </code>
 */
class StringTokenizer {

    /**
     * @var string
     */
    private $token;

    /**
     * @var string
     */
    private $delim;
    /**
     * Constructs a string tokenizer for the specified string
     * @param string $str String to tokenize
     * @param string $delim The set of delimiters (the characters that separate tokens)
     * specified at creation time, default to ' '
     */
    public function __construct(/*string*/ $str, /*string*/ $delim = ' ') {
        $this->token = strtok($str, $delim);
        $this->delim = $delim;
    }

    public function __destruct() {
        unset($this);
    }

    /**
     * Tests if there are more tokens available from this tokenizer's string. It
     * does not move the internal pointer in any way. To move the internal pointer
     * to the next element call nextToken()
     * @return boolean - true if has more tokens, false otherwise
     */
    public function hasMoreTokens() {
        return ($this->token !== false);
    }

    /**
     * Returns the next token from this string tokenizer and advances the internal
     * pointer by one.
     * @return string - next element in the tokenized string
     */
    public function nextToken() {
        $current = $this->token;
        $this->token = strtok($this->delim);
        return $current;
    }
}
?>

down

voojj3054 at gmail dot com ¶

3 years ago

Hello, portuguese documentation of strtok is wrong, at this part which the example(2) is wrong.

Exemplo #2 Comportamento antigo da strtok()
<?php
$first_token  = strtok('/something', '/');
$second_token = strtok('/');
var_dump ($first_token, $second_token);
?>

O exemplo acima produzirá:

    string(0) ""
    string(9) "something"

(this example above, should be inverted as this:)

Correct: 
    string(9) "something"
    string(0) ""

(exemple 3 is correct)
Exemplo #3 Novo comportamento da strtok()
<?php
$first_token  = strtok('/something', '/');
$second_token = strtok('/');
var_dump ($first_token, $second_token);
?>

O exemplo acima produzirá:

    string(9) "something"
    bool(false)

down

pradador at me dot com ¶

15 years ago

Here's a simple class that allows you to iterate through string tokens using a foreach loop.

<?php
/**
 * The TokenIterator class allows you to iterate through string tokens using
 * the familiar foreach control structure.
 * 
 * Example:
 * <code>
 * <?php
 * $string = 'This is a test.';
 * $delimiters = ' ';
 * $ti = new TokenIterator($string, $delimiters);
 * 
 * foreach ($ti as $count => $token) {
 *     echo sprintf("%d, %s\n", $count, $token); 
 * }
 * 
 * // Prints the following output:
 * // 0. This
 * // 1. is
 * // 2. a
 * // 3. test.
 * </code>
 */
class TokenIterator implements Iterator
{
    /**
     * The string to tokenize.
     * @var string
     */
    protected $_string;
    
    /**
     * The token delimiters.
     * @var string
     */
    protected $_delims;
    
    /**
     * Stores the current token.
     * @var mixed
     */
    protected $_token;
    
    /**
     * Internal token counter.
     * @var int
     */
    protected $_counter = 0;
    
    /**
     * Constructor.
     * 
     * @param string $string The string to tokenize.
     * @param string $delims The token delimiters.
     */
    public function __construct($string, $delims)
    {
        $this->_string = $string;
        $this->_delims = $delims;
        $this->_token = strtok($string, $delims);
    }
    
    /**
     * @see Iterator::current()
     */
    public function current()
    {
        return $this->_token;
    }

    /**
     * @see Iterator::key()
     */
    public function key()
    {
        return $this->_counter;
    }

    /**
     * @see Iterator::next()
     */
    public function next()
    {
        $this->_token = strtok($this->_delims);
        
        if ($this->valid()) {
            ++$this->_counter;
        }
    }

    /**
     * @see Iterator::rewind()
     */
    public function rewind()
    {
        $this->_counter = 0;
        $this->_token   = strtok($this->_string, $this->_delims);
    }

    /**
     * @see Iterator::valid()
     */
    public function valid()
    {
        return $this->_token !== FALSE;
    }
}
?>

down

bohwaz ¶

2 years ago

Please note that strtok memory is shared between all PHP code currently executed, even included files. This can bite you in unexpected ways if you are not careful.

For example:

<?php

$path = 'dir/file.ext';
$dir_name = strtok($path, '/');

if ($dir_name !== (new Module)->getAllowedDirName()) {
  throw new \Exception('Invalid directory name');
}

$file_name = strtok('');

?>

Seems easy enough, but if your Module class is not loaded, this triggers the autoloader. The autoloader *MAY* use strtok inside its loading code.

Or your Module class *MAY* use strtok inside its constructor.

This means you will never get your $file_name correctly.

So: you should *always* group strtok calls, without any external code between two strtok calls.

This would be OK:

<?php

$path = 'dir/file.ext';
$dir_name = strtok($path, '/');
$file_name = strtok('');

if ($dir_name !== (new Module)->getAllowedDirName()) {
  throw new \Exception('Invalid directory name');
}

?>

This might cause issues:

<?php

$path = 'one/two#three';
$a = strtok($path, '/');
$b = strtok(Module::NAME_SEPARATOR);
$c = strtok('');

?>

Because your autoloader might be using strtok.

This would be avoided by fetching all parameters used in strtok *before* the calls:

<?php

$path = 'one/two#three';
$separator = Module::NAME_SEPARATOR;
$a = strtok($path, '/');
$b = strtok($separator);
$c = strtok('');

?>

down

heiangus at hotmail dot com ¶

7 years ago

I found this useful for parsing user entered links in text fields.

e.g. This is a link <http://example.com>

    function parselink($link) {
        $bit1 = trim(strtok($link, '<'));
        $bit2 = trim(strtok('>')); 
        $html = '<a href="'.$bit2.'">'.$bit1.'</a>';
        return $html; // <a href="http://example.com">This is a link</a>
    }

down

-1

David Spector ¶

5 years ago

After obtaining zero or more tokens with calls to strtok, you can obtain the remainder of the input string by calling strtok with an empty string as the delimiter.

down

charlie dot ded at orange dot fr ¶

9 years ago

@maisuma you invert paramaters of explode() and strtok() functions, your code does not do what you expect.
You expect to read the input string token after token so equivalent code for strtok() is  arra_filter(explode()) because explode() return lines of empty string when several delimiters are contiguous in the read string, for example  2 whitespaces between.

In fact strtok() is much faster (x2 at least) than arra_filter(explode()) if the read string contains several contiguous delimiters , 
 it is slower if the read string contains one and only one delimiter between tokens.

<?php

$repeat = 10;
$delimiter = ':';
$str=str_repeat('foo:',$repeat);

$timeStrtok=microtime(TRUE);
$token = strtok($str, $delimiter);
while($token!==FALSE){
    //echo $token . ',';
    $token=strtok($delimiter); 
}
$timeStrtok -=microtime(TRUE);

$timeExplo=microtime(TRUE);
$arr = explode($delimiter, $str);
//$arr = array_filter($arr);
$timeExplo -=microtime(TRUE);

//print_r($arr);

$X = 1000000; $unit = 'microsec';

echo PHP_EOL . ' explode() : ' . -$timeExplo . ' ' .$unit .PHP_EOL . ' strtok() :   ' . -$timeStrtok . ' ' . $unit .PHP_EOL;

$timeExplo=round(-$timeExplo*$X);
$timeStrtok=round(-$timeStrtok*$X);

echo PHP_EOL . ' explode() : ' . $timeExplo . ' ' .$unit .PHP_EOL . ' strtok() :   ' . $timeStrtok . ' ' . $unit .PHP_EOL;
echo ' ratio explode / strtok : ' . round($timeExplo / $timeStrtok,1) . PHP_EOL;

?>

down

fabiolimasouto at gmail dot com ¶

15 years ago

this example will hopefully help you understand how this function works:

<?php
$selector = 'div.class#id';
$tagname = strtok($selector,'.#');
echo $tagname.'<br/>';

while($tok = strtok('.#'))
{
 echo $tok.'<br/>';
}

?>

Outputs:
div
class
id

down

-2

eep2004 at ukr dot net ¶

11 years ago

Remove GET variables from the URL
<?php
$url = strtok('http://php.net/manual/en/ref.strings.php?foo=1&bar=2', '?');
// $url = 'http://php.net/manual/en/ref.strings.php'

down

-2

mac.com@nemo ¶

20 years ago

This function takes a string and returns an array with words (delimited by spaces), also taking into account quotes, doublequotes, backticks and backslashes (for escaping stuff).
So

$string = "cp   'my file' to `Judy's file`";
var_dump(parse_cli($string));

would yield:

array(4) {
  [0]=>
  string(2) "cp"
  [1]=>
  string(7) "my file"
  [2]=>
  string(5) "to"
  [3]=>
  string(11) "Judy's file"
}

Way it works, runs through the string character by character, for each character looking up the action to take, based on that character and its current $state.
Actions can be (one or more of) adding the character/string to the current word, adding the word to the output array, and changing or (re)storing the state.
For example a space will become part of the current 'word' (or 'token') if $state is 'doublequoted', but it will start a new token if $state was 'unquoted'.
I was later told it's a "tokeniser using a finite state automaton". Who knew :-)

<?php

#_____________________
# parse_cli($string) /
function parse_cli($string) {
    $state = 'space';
    $previous = '';     // stores current state when encountering a backslash (which changes $state to 'escaped', but has to fall back into the previous $state afterwards)
    $out = array();     // the return value
    $word = '';
    $type = '';         // type of character
    // array[states][chartypes] => actions
    $chart = array(
        'space'        => array('space'=>'',   'quote'=>'q',  'doublequote'=>'d',  'backtick'=>'b',  'backslash'=>'ue', 'other'=>'ua'),
        'unquoted'     => array('space'=>'w ', 'quote'=>'a',  'doublequote'=>'a',  'backtick'=>'a',  'backslash'=>'e',  'other'=>'a'),
        'quoted'       => array('space'=>'a',  'quote'=>'w ', 'doublequote'=>'a',  'backtick'=>'a',  'backslash'=>'e',  'other'=>'a'),
        'doublequoted' => array('space'=>'a',  'quote'=>'a',  'doublequote'=>'w ', 'backtick'=>'a',  'backslash'=>'e',  'other'=>'a'),
        'backticked'   => array('space'=>'a',  'quote'=>'a',  'doublequote'=>'a',  'backtick'=>'w ', 'backslash'=>'e',  'other'=>'a'),
        'escaped'      => array('space'=>'ap', 'quote'=>'ap', 'doublequote'=>'ap', 'backtick'=>'ap', 'backslash'=>'ap', 'other'=>'ap'));
    for ($i=0; $i<=strlen($string); $i++) {
        $char = substr($string, $i, 1);
        $type = array_search($char, array('space'=>' ', 'quote'=>'\'', 'doublequote'=>'"', 'backtick'=>'`', 'backslash'=>'\\'));
        if (! $type) $type = 'other';
        if ($type == 'other') {
            // grabs all characters that are also 'other' following the current one in one go
            preg_match("/[ \'\"\`\\\]/", $string, $matches, PREG_OFFSET_CAPTURE, $i);
            if ($matches) {
                $matches = $matches[0];
                $char = substr($string, $i, $matches[1]-$i); // yep, $char length can be > 1
                $i = $matches[1] - 1;
            }else{
                // no more match on special characters, that must mean this is the last word!
                // the .= hereunder is because we *might* be in the middle of a word that just contained special chars
                $word .= substr($string, $i);
                break; // jumps out of the for() loop
            }
        }
        $actions = $chart[$state][$type];
        for($j=0; $j<strlen($actions); $j++) {
            $act = substr($actions, $j, 1);
            if ($act == ' ') $state = 'space';
            if ($act == 'u') $state = 'unquoted';
            if ($act == 'q') $state = 'quoted';
            if ($act == 'd') $state = 'doublequoted';
            if ($act == 'b') $state = 'backticked';
            if ($act == 'e') { $previous = $state; $state = 'escaped'; }
            if ($act == 'a') $word .= $char;
            if ($act == 'w') { $out[] = $word; $word = ''; }
            if ($act == 'p') $state = $previous;
        }
    }
    if (strlen($word)) $out[] = $word;
    return $out;
}

?>