SunshinePHP Developer Conference 2015

strtok

(PHP 4, PHP 5)

strtokTokeniza uma string

Descrição

string strtok ( string $str , string $token )

strtok() divide uma string (str) em strings menores (tokens), com cada token sendo delimitado por qualquer caractere de token. Quer dizer que, se você tem uma string como "Esta é uma string de exemplo" você poderia "tokenizá-la" em suas palavras individuais usando o caractere de espaço como delimitador do token.

Note que só a primeira chamada a strtok usa o argumento string. Cada chamada subseqüente a strtok só precisa do delimitador a ser usado, já que ele mantém o caminho de onde ele está na string atual. Para iniciar novamente, ou tokenizar uma nova string você simplesmente chama strtok com o argumento de string novamente para inicializá-la. Veja que você pode por múltiplos delimitadores como parâmetro. A string será tokenizada quando um dos caracteres no argumento são encontrados.

Parâmetros

str

A string a ser repartida em pequenos pedaços (tokens).

token

O delimitador usado para repatir a str.

Valor Retornado

Uma string de token.

Exemplos

Exemplo #1 Exemplo da strtok()

<?php
$string 
"This is\tan example\nstring";
/* Use tab e newline como caractere delimitador, que funciona bem  */
$tok strtok($string," \n\t");

while (
$tok !== false) {
    echo 
"Word=$tok<br/>";
    
$tok strtok(" \n\t");
}
?>

O comportamento quando uma parte vazia foi encontrada mudou com o PHP 4.1.0. O antigo comportamento retornava uma string vazia, enquanto a nova, a correta, simplesmente pula a parte da string:

Exemplo #2 Comportamento antigo da strtok()

<?php
$first_token  
strtok('/something''/');
$second_token strtok('/');
var_dump ($first_token$second_token);
?>

O exemplo acima irá imprimir:

    string(0) ""
    string(9) "something"

Exemplo #3 Novo comportamento da strtok()

<?php
$first_token  
strtok('/something''/');
$second_token strtok('/');
var_dump ($first_token$second_token);
?>

O exemplo acima irá imprimir:

    string(9) "something"
    bool(false)

Notas

Aviso

Esta função pode retornar o booleano FALSE, mas também pode retornar um valor não-booleano que pode ser avaliado como FALSE, como 0 ou "". Leia a seção em Booleanos para maiores informações. Utilize o operador === para testar o valor retornado por esta função.

Veja Também

  • split() - Separa strings em array utilizando expressões regulares
  • explode() - Divide uma string em strings

add a note add a note

User Contributed Notes 15 notes

up
14
eep2004 at ukr dot net
1 year ago
<?php
// strtok example
$str = 'Hello to all of Ukraine';
echo
strtok($str, ' ').' '.strtok(' ').' '.strtok(' ');
?>
Result:
Hello to all
up
6
manicdepressive at mindless dot com
10 years ago
<pre><?php
/** get leading, trailing, and embedded separator tokens that were 'skipped'
if for some ungodly reason you are using php to implement a simple parser that
needs to detect nested clauses as it builds a parse tree */

$str = "(((alpha(beta))(gamma))";

$seps = '()';
$tok = strtok( $str,$seps ); // return false on empty string or null
$cur = 0;     
$dumbDone = FALSE;
$done = (FALSE===$tok);
while (!
$done) {
  
// process skipped tokens (if any at first iteration) (special for last)
  
$posTok = $dumbDone ? strlen($str) : strpos($str, $tok, $cur );
  
$skippedMany = substr( $str, $cur, $posTok-$cur ); // false when 0 width
  
$lenSkipped = strlen($skippedMany); // 0 when false
  
if (0!==$lenSkipped) {
     
$last = strlen($skippedMany) -1;
      for(
$i=0; $i<=$last; $i++){
        
$skipped = $skippedMany[$i];
        
$cur += strlen($skipped);
         echo
"skipped: $skipped\n";
      }
   }
   if (
$dumbDone) break; // this is the only place the loop is terminated

   // process current tok
  
echo "curr tok: ".$tok."\n";

  
// update cursor
  
$cur += strlen($tok);

  
// get any next tok
  
if (!$dumbDone){
     
$tok = strtok($seps);
     
$dumbDone = (FALSE===$tok);
     
// you're not really done till you check for trailing skipped
  
}
};
?></pre>
up
2
pradador at me dot com
3 years ago
Here's a simple class that allows you to iterate through string tokens using a foreach loop.

<?php
/**
* The TokenIterator class allows you to iterate through string tokens using
* the familiar foreach control structure.
*
* Example:
* <code>
* <?php
* $string = 'This is a test.';
* $delimiters = ' ';
* $ti = new TokenIterator($string, $delimiters);
*
* foreach ($ti as $count => $token) {
*     echo sprintf("%d, %s\n", $count, $token);
* }
*
* // Prints the following output:
* // 0. This
* // 1. is
* // 2. a
* // 3. test.
* </code>
*/
class TokenIterator implements Iterator
{
   
/**
     * The string to tokenize.
     * @var string
     */
   
protected $_string;
   
   
/**
     * The token delimiters.
     * @var string
     */
   
protected $_delims;
   
   
/**
     * Stores the current token.
     * @var mixed
     */
   
protected $_token;
   
   
/**
     * Internal token counter.
     * @var int
     */
   
protected $_counter = 0;
   
   
/**
     * Constructor.
     *
     * @param string $string The string to tokenize.
     * @param string $delims The token delimiters.
     */
   
public function __construct($string, $delims)
    {
       
$this->_string = $string;
       
$this->_delims = $delims;
       
$this->_token = strtok($string, $delims);
    }
   
   
/**
     * @see Iterator::current()
     */
   
public function current()
    {
        return
$this->_token;
    }

   
/**
     * @see Iterator::key()
     */
   
public function key()
    {
        return
$this->_counter;
    }

   
/**
     * @see Iterator::next()
     */
   
public function next()
    {
       
$this->_token = strtok($this->_delims);
       
        if (
$this->valid()) {
            ++
$this->_counter;
        }
    }

   
/**
     * @see Iterator::rewind()
     */
   
public function rewind()
    {
       
$this->_counter = 0;
       
$this->_token   = strtok($this->_string, $this->_delims);
    }

   
/**
     * @see Iterator::valid()
     */
   
public function valid()
    {
        return
$this->_token !== FALSE;
    }
}
?>
up
1
Logikos
5 years ago
This looks very simple, but it took me a long time to figure out so I thought I'd share it incase someone else was wanting the same thing:

this should work similar to substr() but with tokens instead!

<?php
/* subtok(string,chr,pos,len)
*
* chr = chr used to seperate tokens
* pos = starting postion
* len = length, if negative count back from right
*
*  subtok('a.b.c.d.e','.',0)     = 'a.b.c.d.e'
*  subtok('a.b.c.d.e','.',0,2)   = 'a.b'
*  subtok('a.b.c.d.e','.',2,1)   = 'c'
*  subtok('a.b.c.d.e','.',2,-1)  = 'c.d'
*  subtok('a.b.c.d.e','.',-4)    = 'b.c.d.e'
*  subtok('a.b.c.d.e','.',-4,2)  = 'b.c'
*  subtok('a.b.c.d.e','.',-4,-1) = 'b.c.d'
*/
function subtok($string,$chr,$pos,$len = NULL) {
  return
implode($chr,array_slice(explode($chr,$string),$pos,$len));
}
?>

explode breaks the tokens up into an array, array slice alows you to pick then tokens you want, and then implode converts it back to a string

although its far from a clone, this was inspired by mIRC's gettok() function
up
1
mac.com@nemo
8 years ago
This function takes a string and returns an array with words (delimited by spaces), also taking into account quotes, doublequotes, backticks and backslashes (for escaping stuff).
So

$string = "cp   'my file' to `Judy's file`";
var_dump(parse_cli($string));

would yield:

array(4) {
  [0]=>
  string(2) "cp"
  [1]=>
  string(7) "my file"
  [2]=>
  string(5) "to"
  [3]=>
  string(11) "Judy's file"
}

Way it works, runs through the string character by character, for each character looking up the action to take, based on that character and its current $state.
Actions can be (one or more of) adding the character/string to the current word, adding the word to the output array, and changing or (re)storing the state.
For example a space will become part of the current 'word' (or 'token') if $state is 'doublequoted', but it will start a new token if $state was 'unquoted'.
I was later told it's a "tokeniser using a finite state automaton". Who knew :-)

<?php

#_____________________
# parse_cli($string) /
function parse_cli($string) {
   
$state = 'space';
   
$previous = '';     // stores current state when encountering a backslash (which changes $state to 'escaped', but has to fall back into the previous $state afterwards)
   
$out = array();     // the return value
   
$word = '';
   
$type = '';         // type of character
    // array[states][chartypes] => actions
   
$chart = array(
       
'space'        => array('space'=>'',   'quote'=>'q''doublequote'=>'d''backtick'=>'b''backslash'=>'ue', 'other'=>'ua'),
       
'unquoted'     => array('space'=>'w ', 'quote'=>'a''doublequote'=>'a''backtick'=>'a''backslash'=>'e''other'=>'a'),
       
'quoted'       => array('space'=>'a''quote'=>'w ', 'doublequote'=>'a''backtick'=>'a''backslash'=>'e''other'=>'a'),
       
'doublequoted' => array('space'=>'a''quote'=>'a''doublequote'=>'w ', 'backtick'=>'a''backslash'=>'e''other'=>'a'),
       
'backticked'   => array('space'=>'a''quote'=>'a''doublequote'=>'a''backtick'=>'w ', 'backslash'=>'e''other'=>'a'),
       
'escaped'      => array('space'=>'ap', 'quote'=>'ap', 'doublequote'=>'ap', 'backtick'=>'ap', 'backslash'=>'ap', 'other'=>'ap'));
    for (
$i=0; $i<=strlen($string); $i++) {
       
$char = substr($string, $i, 1);
       
$type = array_search($char, array('space'=>' ', 'quote'=>'\'', 'doublequote'=>'"', 'backtick'=>'`', 'backslash'=>'\\'));
        if (!
$type) $type = 'other';
        if (
$type == 'other') {
           
// grabs all characters that are also 'other' following the current one in one go
           
preg_match("/[ \'\"\`\\\]/", $string, $matches, PREG_OFFSET_CAPTURE, $i);
            if (
$matches) {
               
$matches = $matches[0];
               
$char = substr($string, $i, $matches[1]-$i); // yep, $char length can be > 1
               
$i = $matches[1] - 1;
            }else{
               
// no more match on special characters, that must mean this is the last word!
                // the .= hereunder is because we *might* be in the middle of a word that just contained special chars
               
$word .= substr($string, $i);
                break;
// jumps out of the for() loop
           
}
        }
       
$actions = $chart[$state][$type];
        for(
$j=0; $j<strlen($actions); $j++) {
           
$act = substr($actions, $j, 1);
            if (
$act == ' ') $state = 'space';
            if (
$act == 'u') $state = 'unquoted';
            if (
$act == 'q') $state = 'quoted';
            if (
$act == 'd') $state = 'doublequoted';
            if (
$act == 'b') $state = 'backticked';
            if (
$act == 'e') { $previous = $state; $state = 'escaped'; }
            if (
$act == 'a') $word .= $char;
            if (
$act == 'w') { $out[] = $word; $word = ''; }
            if (
$act == 'p') $state = $previous;
        }
    }
    if (
strlen($word)) $out[] = $word;
    return
$out;
}

?>
up
0
eep2004 at ukr dot net
15 days ago
Remove GET variables from the URL
<?php
$url
= strtok('http://php.net/manual/en/ref.strings.php?foo=1&bar=2', '?');
// $url = 'http://php.net/manual/en/ref.strings.php'
up
0
info at maisuma dot jp
4 months ago
If you want to tokenize by only one letter, explode() is much faster compared to strtok().

<?php
$str
=str_repeat('foo ',10000);

//explode()
$time=microtime(TRUE);
$arr=explode($str,' ');
$time=microtime(TRUE)-$time;
echo
"explode():$time sec.".PHP_EOL;

//strtok()
$time=microtime(TRUE);
$ret=strtok(' ',$str);
while(
$ret!==FALSE){
   
$ret=strtok(' ');
}
$time=microtime(TRUE)-$time;
echo
"strtok():$time sec.".PHP_EOL;

?>

The result is : (PHP 5.3.3 on CentOS)

explode():0.001317024230957 sec.
strtok():0.0058917999267578 sec.

explode() is about five times fast in short strings, too.
up
0
Axeia
8 months ago
Might be pointing out the obvious but if you'd rather use a for loop rather than a while (to keep the token strings on the same line for readability for example), it can be done. Added bonus, it doesn't put a $tok variable outside the loop itself either.
Downside however is that you're not able to manually free up the memory used using the technique mentioned by elarlang.

<?php
for($tok = strtok($str, ' _-.'); $tok!==false; $tok = strtok(' _-.'))
{
  echo
"$tok </br>";
}
?>
up
0
gilthans at NOSPAM dot gmail dot com
2 years ago
Note that strtok may receive different tokens each time. Therefore, if, for example, you wish to extract several words and then the rest of the sentence:

<?php
$text
= "13 202 5 This is a long message explaining the error codes.";
$error1 = strtok($text, " "); //13
$error2 = strtok(" "); //202
$error3 = strtok(" "); //5
$error_message = strtok(""); //Notice the different token parameter
echo $error_message; //This is a long message explaining the error codes.
?>
up
0
elarlang at gmail dot com
3 years ago
If you have memory-usage critical solution, you should keep in mind, that strtok function holds input string parameter (or reference to it?) in memory after usage.

<?php
function tokenize($str, $token_symbols) {
   
$word = strtok($str, $token_symbols);
    while (
false !== $word) {
       
// do something here...
       
$word = strtok($token_symbols);
    }
}
?>
Test-cases with handling ~10MB plain-text file:
Case #1 - unset $str variable
<?php
$token_symbols
= " \t\n";
$str = file_get_contents('10MB.txt'); // mem usage 9.75383758545 MB (memory_get_usage() / 1024 / 1024));
tokenize($str, $token_symbols); // mem usage 9.75400161743 MB
unset($str); // 9.75395584106 MB
?>
Case #1 result: memory is still used

Case #2 - call strtok again
<?php
$token_symbols
= " \t\n";
$str = file_get_contents('10MB.txt'); // 9.75401306152 MB
tokenize($str, $token_symbols); // 9.75417709351
strtok('', ''); // 9.75421524048
?>
Case #2 result: memory is still used

Case #3 - call strtok again AND unset $str variable
<?php
$token_symbols
= " \t\n";
$str = file_get_contents('10MB.txt'); // 9.75410079956 MB
tokenize($str, $token_symbols); // 9.75426483154 MB
unset($str);
strtok('', ''); // 0.0543975830078 MB
?>
Case #3 result: memory is free

So, better solution for tokenize function:
<?php
function tokenize($str, $token_symbols, $token_reset = true) {
   
$word = strtok($str, $token_symbols);
    while (
false !== $word) {
       
// do something here...
       
$word = strtok($token_symbols);
    }

    if(
$token_reset)
       
strtok('', '');
}
?>
up
0
benighted at gmail dot com
4 years ago
Simple way to tokenize search parameters, including double or single quoted keys.  If only one quote is found, the rest of the string is assumed to be part of that token.

<?php
            $token
= strtok($keywords,' ');
            while (
$token) {
               
// find double quoted tokens
               
if ($token{0}=='"') { $token .= ' '.strtok('"').'"'; }
               
// find single quoted tokens
               
if ($token{0}=="'") { $token .= ' '.strtok("'")."'"; }

               
$tokens[] = $token;
               
$token = strtok(' ');
            }
?>

Use substr(1,strlen($token)) and remove the part that adds the trailing quotes if you want your output without quotes.
up
0
azeem
5 years ago
Here is a java like StringTokenizer class using strtok function:

<?php

/**
* The string tokenizer class allows an application to break a string into tokens.
*
* @example The following is one example of the use of the tokenizer. The code:
* <code>
* <?php
*    $str = 'this is:@\t\n a test!';
*    $delim = ' !@:'\t\n; // remove these chars
*    $st = new StringTokenizer($str, $delim);
*    while ($st->hasMoreTokens()) {
*        echo $st->nextToken() . "\n";
*    }
*    prints the following output:
*      this
*      is
*      a
*      test
* ?>
* </code>
*/
class StringTokenizer {

   
/**
     * @var string
     */
   
private $token;

   
/**
     * @var string
     */
   
private $delim;
   
/**
     * Constructs a string tokenizer for the specified string
     * @param string $str String to tokenize
     * @param string $delim The set of delimiters (the characters that separate tokens)
     * specified at creation time, default to ' '
     */
   
public function __construct(/*string*/ $str, /*string*/ $delim = ' ') {
       
$this->token = strtok($str, $delim);
       
$this->delim = $delim;
    }

    public function
__destruct() {
        unset(
$this);
    }

   
/**
     * Tests if there are more tokens available from this tokenizer's string. It
     * does not move the internal pointer in any way. To move the internal pointer
     * to the next element call nextToken()
     * @return boolean - true if has more tokens, false otherwise
     */
   
public function hasMoreTokens() {
        return (
$this->token !== false);
    }

   
/**
     * Returns the next token from this string tokenizer and advances the internal
     * pointer by one.
     * @return string - next element in the tokenized string
     */
   
public function nextToken() {
       
$current = $this->token;
       
$this->token = strtok($this->delim);
        return
$current;
    }
}
?>
up
0
yanick dot rochon at gmail dot com
5 years ago
Here is a small function I wrote as I needed to extract some named tokens from a string (a la Google). For example, I needed to format a string like "extension:gif size:64M animated:true author:'John Bash'" into

array(
  'extension' => 'gif',
  'size' => '64M',
  'animated' => true,
  'author' => 'John Bash'
)

So, here's the code:

<?php

header
('Content-type: text/plain; charset=utf-8');

/**
* NOTE : use mbstring.func_overload for multi-byte support with this function
*
* @param string $string             the string to tokenize
* @param int $offset                the starting offset
* @param string $defaultTokenName   the default token name if none specified
* @param string $groupDelimiters    the characters to delimit token groups
* @param string $groupNameDelimiter the character(s) to delimit token group names
* @return array
*/
function getTokens(
       
$string,
       
$offset = 0,
       
$defaultTokenName = null,
       
$groupDelimiters = '\'"',
       
$groupNameDelimiter = ':')
{

    if (
$offset >= strlen($string)) {
       
//echo "offset out of range";
       
return false;
    }

   
$spaces = " \t\n\r";   // space characters

    // add group delimiters to spaces...
   
$groupSpaces = $spaces . $groupNameDelimiter;
   
$delimiters = $groupSpaces . $groupDelimiters;

   
//var_dump($groupSpaces);

   
$string = ltrim(substr($string, $offset), $groupSpaces);
   
$token_strings = array();

   
//echo "String is : " . $string . "\n";

    // 1. split all tokens...
   
while ($offset < strlen($string)) {
       
$lastOffset = $offset;
       
$escaped = false;

        if (
false !== strpos($groupDelimiters, $char = $string[$offset])) {
           
$groupChar = $char;
        } else {
           
$groupChar = null;
        }

        if (
null !== $groupChar) {
            while ((
$offset < strlen($string)) && (($groupChar !== ($char = $string[++$offset])) || $escaped)) {
               
//$offset++;
               
$escaped = ('\\' === $char);
            }
           
$offset++;
           
//echo "*** Grouped : " . substr($string, $lastOffset, $offset - $lastOffset) . "\n";
       
} else {
            while ((
$offset < strlen($string)) && ((false === strpos($delimiters, $char = $string[$offset])) || $escaped)) {
               
$offset++;
               
$escaped = ('\\' === $char);
            }
           
//echo "*** Non-group : " . substr($string, $lastOffset, $offset - $lastOffset) . "\n";
       
}
       
//skip spaces...
       
while (($offset < strlen($string)) && ((false !== strpos($groupSpaces, $char = $string[$offset])) || $escaped)) {
           
$offset++;
           
$escaped = ('\\' === $char);
        }

       
$token_strings[] = substr($string, $lastOffset, $offset - $lastOffset);
       
//echo "Next token = '" . end($token_strings) . "'\n";
   
}

   
$tokens = array();
   
$tokenName = null;
    foreach (
$token_strings as $token_str) {
       
// clean $token_str
       
$token_str = trim(stripslashes($token_str), $spaces);
       
$str_value = trim($token_str, $delimiters);
        switch (
strtolower($str_value)) {
            case
'true': $str_value = true; break;
            case
'false': $str_value = false; break;
            default: break;
        }

       
// is it a token name?
       
if (':' === substr($token_str, -1, 1)) {
            if (!empty(
$tokenName)) {
               
$tokens[$tokenName] = '';
            }
           
$tokenName = trim($token_str, $delimiters);
        } else {
            if (!empty(
$tokenName)) {
                if (isset(
$tokens[$tokenName])) {
                   
$tokens[$tokenName] = array(
                       
$tokens[$tokenName],
                       
$str_value
                   
);
                } else {
                   
$tokens[$tokenName] = $str_value;
                }
               
$tokenName = null;
            } elseif (empty(
$defaultTokenName)) {
               
$tokens[] = trim($token_str, $delimiters);;
            } else {
                if (isset(
$tokens[$defaultTokenName])) {
                   
$tokens[$defaultTokenName] = array(
                       
$tokens[$defaultTokenName],
                       
$str_value
                   
);
                } else {
                   
$tokens[$defaultTokenName] = $str_value;
                }
            }
        }
    }
    if (!empty(
$tokenName)) {
       
$tokens[$tokenName] = '';
    }

    return
$tokens;
}

$str = "check1: test "
    
. "check2:'hello world' "
    
. 'check3: "foo" '
    
. "check4: \\\"try this\\\""
    
. '"buz" '
    
. 'check1:true';

?>
up
-1
KrazyBox
5 years ago
As of the change in strtok()'s handling of empty strings, it is now useless for scripts that rely on empty data to function.

Take for instance, a standard header. (with UNIX newlines)

http/1.0 200 OK\n
Content-Type: text/html\n
\n
--HTML BODY HERE---

When parsing this with strtok, one would wait until it found an empty string to signal the end of the header. However, because strtok now skips empty segments, it is impossible to know when the header has ended.
This should not be called `correct' behavior, it certainly is not. It has rendered strtok incapable of (properly) processing a very simple standard.

This new functionality, however, does not affect Windows style headers. You would search for a line that only contains "\r"
This, however, is not a justification for the change.
up
-1
fabiolimasouto at gmail dot com
3 years ago
this example will hopefully help you understand how this function works:

<?php
$selector
= 'div.class#id';
$tagname = strtok($selector,'.#');
echo
$tagname.'<br/>';

while(
$tok = strtok('.#'))
{
echo
$tok.'<br/>';
}

?>

Outputs:
div
class
id
To Top