downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

token_name> <Fonctions Tokenizer
[edit] Last updated: Fri, 10 Feb 2012

view this page in

token_get_all

(PHP 4 >= 4.2.0, PHP 5)

token_get_allScinde un code source en éléments de base

Description

array token_get_all ( string $source )

token_get_all() analyse la chaîne donnée source en utilisant l'analyseur lexical du moteur Zend.

Pour une liste des tokens, voir Liste des tokens de l'analyseur, ou utilisez la fonction token_name() pour traduire une valeur token dans une représentation sous forme de chaîne de caractères.

Liste de paramètres

source

Le source PHP à analyser.

Valeurs de retour

Un tableau contenant la liste des descriptions des éléments. Chaque élément du tableau peut être un caractère unique (i.e.: ;, ., >, !, etc.) ou bien un tableau contenant un identifiant de token dans l'élément 0, la représentation de ce code source dans l'élément 1 et le numéro de la ligne dans l'élément 2.

Exemples

Exemple #1 Exemple avec token_get_all()

<?php
$tokens 
token_get_all('<?php echo; ?>'); /* => array(
                                                  array(T_OPEN_TAG, '<?php'),
                                                  array(T_ECHO, 'echo'),
                                                  ';',
                                                  array(T_CLOSE_TAG, '?>') ); */
/* Notez que dans l'exemple suivant, la chaîne est parsée
comme T_INLINE_HTML plutôt que l'attendu T_COMMENT (T_COMMENT dans PHP inférieur
à la version 5), car il n'y a pas d'ouverture/fermeture de balises utilisées dans le "code".
Cela revient à mettre un commentaire à l'extérieur des balises <?php ?> dans
un fichier normal. */
$tokens token_get_all('/* comment */'); // => array(array(T_INLINE_HTML, '/* comment */'));
?>

Historique

Version Description
5.2.2 Les numéros de lignes sont retournés dans l'élément 2



token_name> <Fonctions Tokenizer
[edit] Last updated: Fri, 10 Feb 2012
 
add a note add a note User Contributed Notes token_get_all
comments at htmlcompressor dot com 18-Jan-2011 02:16
Not documented but worth mentioning that this function will detect a T_OPEN_TAG or T_OPEN_TAG_WITH_ECHO based on your php.ini settings.

So in order to detect "<?", "<?=", "?>", "<%", "<%" and "%>" as a php open / close tags, check the folowing settings in your php.ini:

; Allow Short tags <? <?= ?>
short_open_tag = On

; Allow ASP-style tags <% <%= %>
asp_tags = On
gomodo at free dot fr 02-Aug-2009 10:08
Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all()

1 : bug line numbers
 Since PHP 5.2.2 token_get_all()  should return Line numbers in element 2..
.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() ,  sometimes you find wrongs line numbers  (return next line)... :(

2: bug warning message can impact loops
Warning with php code uncompleted (ex : php code line by line) :
for example if a comment tag is not closed  token_get_all()  can block loops on this  warning :
Warning: Unterminated comment starting line

This problem seem not occur in CLI mod (php command line), but only in web mod.

Waiting more stability, used token_get_all()  only on PHP code (not HMTL miwed) :
First extract entirely PHP code (with open et close php tag),
Second use token_get_all()  on the pure PHP code.

3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?

Waiting, I used a function :

The code at end this post :
http://www.developpez.net/forums/d786381/php/langage/
fonctions/analyser-fichier-php-token_get_all/

This function not support :
- Old notation :  "<?  ?>" and "<% %>"
- heredoc syntax
- nowdoc syntax (since PHP 5.3.0)
Dennis Robinson from basnetworks dot net 28-Jun-2009 09:24
I wanted to use the tokenizer functions to count source lines of code, including counting comments.  Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations.  The token_get_all() function makes this task easy by detecting all the comments properly.  However, it does not tokenize newline characters.  I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.

<?php

define
('T_NEW_LINE', -1);

function
token_get_all_nl($source)
{
   
$new_tokens = array();

   
// Get the tokens
   
$tokens = token_get_all($source);

   
// Split newlines into their own tokens
   
foreach ($tokens as $token)
    {
       
$token_name = is_array($token) ? $token[0] : null;
       
$token_data = is_array($token) ? $token[1] : $token;

       
// Do not split encapsed strings or multiline comments
       
if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')
        {
           
$new_tokens[] = array($token_name, $token_data);
            continue;
        }

       
// Split the data up by newlines
       
$split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

        foreach (
$split_data as $data)
        {
            if (
$data == "\r\n" || $data == "\n")
            {
               
// This is a new line token
               
$new_tokens[] = array(T_NEW_LINE, $data);
            }
            else
            {
               
// Add the token under the original token name
               
$new_tokens[] = is_array($token) ? array($token_name, $data) : $data;
            }
        }
    }

    return
$new_tokens;
}

function
token_name_nl($token)
{
    if (
$token === T_NEW_LINE)
    {
        return
'T_NEW_LINE';
    }

    return
token_name($token);
}

?>

Example usage:

<?php

$tokens
= token_get_all_nl(file_get_contents('somecode.php'));

foreach (
$tokens as $token)
{
    if (
is_array($token))
    {
        echo (
token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');
    }
    else
    {
        echo (
'"' . $token . '"<br />');
    }
}

?>

I'm sure you can figure out how to count the lines of code, and lines of comments with these functions.  This was a huge improvement on my previous attempt at counting lines of code with regular expressions.  I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.
strrev xc.noxeh@ellij 22-Dec-2008 05:45
If you run token_get_all() on a string which starts with #!... (eg, #!/usr/local/bin/php) this will be lost.
If you rewrite the file using token_get_all() and writing back all tokens afterwards, the #!... will be gone.
This causes commandline executable scripts (interpreted by PHP) to be unexecutable.
kevin at metalaxe dot com 26-Apr-2008 02:58
Rogier, thanks for that fix. This bug still exists in php 5.2.5. I did notice though that it is possible for a notice to pop up from your code. Changing this line:

            $temp[] = $tokens[0][2];

To read this:

            $temp[] = isset($tokens[0][2])?$tokens[0][2]:'unknown';

fixes this notice.
rogier 10-Jan-2008 11:01
Complementary note to code below:
Note that only the FIRST 2 (or 3, if needed) array elements will be updated.

Since I only encountered incorrect results on the FIRST occurence of T_OPEN_TAG, I wrote this quick fix.
Any other following T_OPEN_TAG are, on my testing system (Apache 2.0.52, PHP 5.0.3), parsed correctly.

So, This function assumes only a possibly incorrect first T_OPEN_TAG.
Also, this function assumes the very first element (and ONLY the first element) of the token array to be the possibly incorrect token.
This effectively translates to the first character of the tokenized source to be the start of a php script opening tag '<', followed by either 'php' OR '%' (ASP_style)
rogier at dsone dot nl 10-Jan-2008 08:37
On several PHP versions (pre 5.1), if token_get_all is used, the result will NOT always return the correct result.
This bug will only show (as far as I know) when PHP is loaded as a module. In the CLI the bug seems non-existent.
Related here are bugs 29761 and 34782.
To work around this, here's a fixing function:

<?php
//fixes related bugs: 29761, 34782 => token_get_all returns <?php NOT as T_OPEN_TAG
function token_fix( &$tokens ) {
    if (!
is_array($tokens) || (count($tokens)<2)) {
        return;
    }
  
//return of no fixing needed
   
if (is_array($tokens[0]) && (($tokens[0][0]==T_OPEN_TAG) || ($tokens[0][0]==T_OPEN_TAG_WITH_ECHO)) ) {
        return;
    }
   
//continue
   
$p1 = (is_array($tokens[0])?$tokens[0][1]:$tokens[0]);
   
$p2 = (is_array($tokens[1])?$tokens[1][1]:$tokens[1]);
   
$p3 = '';

    if ((
$p1.$p2 == '<?') || ($p1.$p2 == '<%')) {
       
$type = ($p2=='?')?T_OPEN_TAG:T_OPEN_TAG_WITH_ECHO;
       
$del = 2;
       
//update token type for 3rd part?
       
if (count($tokens)>2) {
           
$p3 = is_array($tokens[2])?$tokens[2][1]:$tokens[2];
           
$del = (($p3=='php') || ($p3=='='))?3:2;
           
$type = ($p3=='=')?T_OPEN_TAG_WITH_ECHO:$type;
        }
       
//rebuild erroneous token
       
$temp = array($type, $p1.$p2.$p3);
        if (
version_compare(phpversion(), '5.2.2', '<' )===false) {
           
$temp[] = $token[0][2];
        }
       
//rebuild
       
$tokens[1] = '';
        if (
$del==3) $tokens[2]='';
       
$tokens[0] = $temp;
    }
    return;
}

?>
nicolas dot grekas+php at gmail dot com 03-Dec-2007 01:10
Well, there is a way to parse for errors. See
http://www.php.net/manual/function.php-check-syntax.php#77318
smp_info at yahoo dot com 30-Nov-2007 06:50
As far as I am aware, there is no way to tell if the source code passed is free of parse errors.

You might come across such a situation when you're using PHP to analyze PHP source code.

In a case like this.. You'll get a warning similar to (but varying) Warning: Unexpected character in input: ''' (ASCII=39) state=1

If it doesn't matter to you that the source is free of parse errors, use @token_get_all($source) to suppress the error.
phpcomments at majiclab dot com 01-Aug-2005 10:08
Regarding bertrand at toggg dot com's comment:  there is another case of the { } curly braces being used in PHP, but the token_get_all() function treats it just like a code block: string index.  Example:

<?php
$text
= "Hello";
if (
$text{ 0 } == 'H') {
    echo
"This example uses { for both a PHP block and a string index.";
}
?>

Just in case some people were wondering.  Since PHP treats them as the same token, it makes some things a little more interesting for parsing.  You can't just assume that { ... } is a code block, it could just be a number referring to an index of a string.
bertrand at toggg dot com 07-Mar-2005 10:41
If you want to retrieve the PHP blocks then you will count up the opening curly braces '{' and down the closing ones '}' (counter zero means block finished)
CAUTION: the opening curly braces token can take 3 values:
1) '{' for all PHP code blocks,
2) T_CURLY_OPEN for "protected" variables within strings as "{$var}"
3) T_DOLLAR_OPEN_CURLY_BRACES for extended format "${var}"

On the other hand, closing token is allways '}' !

So counting up must take place on the 3 tokens:
'{' , T_CURLY_OPEN and T_DOLLAR_OPEN_CURLY_BRACES

Have fun with PHP tokenizer !
bishop 07-Dec-2004 10:58
You may want to know the line and column number at which a token begins (or ends). Since this tokenizer interface doesn't provide that information, you have to track it manually, like below:

<?php
function update_line_and_column_positions($c, &$line, &$col)
{
   
// update line count
   
$numNewLines = substr_count($c, "\n");
    if (
1 <= $numNewLines) {
       
// have new lines, add them in
       
$line += $numNewLines;
       
$col  1;

       
// skip to right past the last new line, as it won't affect the column position
       
$c = substr($c, strrpos($c, "\n") + 1);
        if (
$c === false) {
           
$c = '';
        }
    }

   
// update column count
   
$col += strlen($c);
}

?>

Now use it, something like:

<?php

$line
= 1;
$col  = 1;
foreach (
$tokens as $token) {
    if (
is_array($token)) {
        list (
$token, $text) = $token;
    } else if (
is_string($token)) {
       
$text = $token;
    }

   
update_line_and_column_positions($text, $line, $col);
}

?>

Note this assumes that your desired coordinate system is 1-based (eg (1,1) is the upper left). Zero-based is left as an exercise for the reader.
Leon Atkinson 06-Dec-2002 03:17
This function parses PHP code.  Here's an example of it's use.
<?
    $code = '<?$a = 3;?>';

    foreach(token_get_all($code) as $c)
    {
        if(is_array($c))
        {
            print(token_name($c[0]) . ": '" . htmlentities($c[1]) . "'\n");
        }
        else
        {
            print("$c\n");
        }
    }
?>

 
show source | credits | stats | sitemap | contact | advertising | mirror sites