PHP 5.4.31 Released

La classe DOMDocument

(PHP 5)

Introduction

Représente un document HTML ou XML entier ; ce sera la racine de l'arbre document.

Synopsis de la classe

DOMDocument extends DOMNode {
/* Propriétés */
readonly public string $actualEncoding ;
readonly public DOMConfiguration $config ;
readonly public DOMDocumentType $doctype ;
readonly public DOMElement $documentElement ;
public string $documentURI ;
public string $encoding ;
public bool $formatOutput ;
public bool $preserveWhiteSpace = true ;
public bool $recover ;
public bool $resolveExternals ;
public bool $standalone ;
public bool $strictErrorChecking = true ;
public bool $substituteEntities ;
public bool $validateOnParse = false ;
public string $version ;
readonly public string $xmlEncoding ;
public bool $xmlStandalone ;
public string $xmlVersion ;
/* Méthodes */
public __construct ([ string $version [, string $encoding ]] )
public DOMAttr createAttribute ( string $name )
public DOMAttr createAttributeNS ( string $namespaceURI , string $qualifiedName )
public DOMCDATASection createCDATASection ( string $data )
public DOMComment createComment ( string $data )
public DOMDocumentFragment createDocumentFragment ( void )
public DOMElement createElement ( string $name [, string $value ] )
public DOMElement createElementNS ( string $namespaceURI , string $qualifiedName [, string $value ] )
public DOMEntityReference createEntityReference ( string $name )
public DOMProcessingInstruction createProcessingInstruction ( string $target [, string $data ] )
public DOMText createTextNode ( string $content )
public DOMElement getElementById ( string $elementId )
public DOMNodeList getElementsByTagName ( string $name )
public DOMNodeList getElementsByTagNameNS ( string $namespaceURI , string $localName )
public DOMNode importNode ( DOMNode $importedNode [, bool $deep ] )
public mixed load ( string $filename [, int $options = 0 ] )
public bool loadHTML ( string $source [, int $options = 0 ] )
public bool loadHTMLFile ( string $filename [, int $options = 0 ] )
public mixed loadXML ( string $source [, int $options = 0 ] )
public void normalizeDocument ( void )
public bool registerNodeClass ( string $baseclass , string $extendedclass )
public bool relaxNGValidate ( string $filename )
public bool relaxNGValidateSource ( string $source )
public int save ( string $filename [, int $options ] )
public string saveHTML ([ DOMNode $node = NULL ] )
public int saveHTMLFile ( string $filename )
public string saveXML ([ DOMNode $node [, int $options ]] )
public bool schemaValidate ( string $filename [, int $flags ] )
public bool schemaValidateSource ( string $source [, int $flags ] )
public bool validate ( void )
public int xinclude ([ int $options ] )
/* Méthodes héritées */
public DOMNode DOMNode::appendChild ( DOMNode $newnode )
public string DOMNode::C14N ([ bool $exclusive [, bool $with_comments [, array $xpath [, array $ns_prefixes ]]]] )
public int DOMNode::C14NFile ( string $uri [, bool $exclusive [, bool $with_comments [, array $xpath [, array $ns_prefixes ]]]] )
public DOMNode DOMNode::cloneNode ([ bool $deep ] )
public int DOMNode::getLineNo ( void )
public string DOMNode::getNodePath ( void )
public bool DOMNode::hasAttributes ( void )
public bool DOMNode::hasChildNodes ( void )
public DOMNode DOMNode::insertBefore ( DOMNode $newnode [, DOMNode $refnode ] )
public bool DOMNode::isDefaultNamespace ( string $namespaceURI )
public bool DOMNode::isSameNode ( DOMNode $node )
public bool DOMNode::isSupported ( string $feature , string $version )
public string DOMNode::lookupNamespaceURI ( string $prefix )
public string DOMNode::lookupPrefix ( string $namespaceURI )
public void DOMNode::normalize ( void )
public DOMNode DOMNode::removeChild ( DOMNode $oldnode )
public DOMNode DOMNode::replaceChild ( DOMNode $newnode , DOMNode $oldnode )
}

Propriétés

actualEncoding

Obsolète. L'encodage actuel du document, en lecture seule, équivalent àencoding.

config

Obsolète. Configuration utilisée lorsque DOMDocument::normalizeDocument() est appelé.

doctype

Le doctype associé au document.

documentElement

C'est un attribut de convenance, qui permet un accès direct au noeud fils, qui est l'élément document du document.

documentURI

La localisation du document, ou NULL si indéfini.

encoding

L'encodage du document, tel que spécifié par la déclaration XML. Cet attribut n'est pas présent dans la spécification DOM Level 3 finale, mais représente la seule façon de manipuler l'encodage du document XML dans cette implémentation.

formatOutput

Formate la sortie avec une jolie indentation et des espaces supplémentaires.

implementation

L'objet DOMImplementation qui gère ce document.

preserveWhiteSpace

Ne pas supprimer les espaces redondants. Vaut par défaut TRUE.

recover

Propriétaire. Active le mode "recovery", i.e. tente d'analyser un document mal formé. Cet attribut ne fait pas parti de la spécification DOM et est spécifique à libxml.

resolveExternals

Définissez-le à TRUE pour charger des entités externes depuis la déclaration doctype. C'est utile pour inclure des entités dans vos documents XML.

standalone

Obsolète. Si le document est "standalone", ou non, tel que spécifié par la déclaration XML, correspondant à xmlStandalone.

strictErrorChecking

Lance une DOMException en cas d'erreur. Par défaut, vaut TRUE.

substituteEntities

Propriétaire. Si l'on doit ou non substituer les entités. Cet attribut ne fait pas parti de la spécification DOM et est spécifique à libxml.

validateOnParse

Charge et valide la DTD. Par défaut, vaut FALSE.

version

Obsolète. Version du XML, correspond à xmlVersion.

xmlEncoding

Un attribut spécifiant l'encodage du document. Il vaut NULL lorsque l'encodage n'est pas spécifié, ou lorsqu'il est inconnu, comme c'est le cas lorsque le document a été créé en mémoire.

xmlStandalone

Un attribut spécifiant si le document est "standalone". Il vaut FALSE lorsque non spécifié.

xmlVersion

Un attribut spécifiant le numéro de version du document. S'il n'y a pas de déclaration et si le document supporte la fonctionnalité "XML", la valeur sera "1.0".

Notes

Note:

L'extension DOM utilise l'encodage UTF-8. Utilisez utf8_encode() et utf8_decode() pour traiter les textes encodés en ISO-8859-1 ou Iconv pour les autres encodages.

Sommaire

add a note add a note

User Contributed Notes 11 notes

up
12
Fernando H
6 years ago
Showing a quick example of how to use this class, just so that new users can get a quick start without having to figure it all out by themself. ( At the day of posting, this documentation just got added and is lacking examples. )

<?php

// Set the content type to be XML, so that the browser will   recognise it as XML.
header( "content-type: application/xml; charset=ISO-8859-15" );

// "Create" the document.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );

// Create some elements.
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track", "The ninth symphony" );

// Set the attributes.
$xml_track->setAttribute( "length", "0:01:15" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );

// Create another element, just to show you can add any (realistic to computer) number of sublevels.
$xml_note = $xml->createElement( "Note", "The last symphony composed by Ludwig van Beethoven." );

// Append the whole bunch.
$xml_track->appendChild( $xml_note );
$xml_album->appendChild( $xml_track );

// Repeat the above with some different values..
$xml_track = $xml->createElement( "Track", "Highway Blues" );

$xml_track->setAttribute( "length", "0:01:33" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
$xml_album->appendChild( $xml_track );

$xml->appendChild( $xml_album );

// Parse the XML.
print $xml->saveXML();

?>

Output:
<Album>
  <Track length="0:01:15" bitrate="64kb/s" channels="2">
    The ninth symphony
    <Note>
      The last symphony composed by Ludwig van Beethoven.
    </Note>
  </Track>
  <Track length="0:01:33" bitrate="64kb/s" channels="2">Highway Blues</Track>
</Album>

If you want your PHP->DOM code to run under the .xml extension, you should set your webserver up to run the .xml extension with PHP ( Refer to the installation/configuration configuration for PHP on how to do this ).

Note that this:
<?php
$xml
= new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

is NOT the same as this:
<?php
// Will NOT work.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml_track = new DOMElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

although this will work:
<?php
$xml
= new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml->appendChild( $xml_album );
?>
up
3
evert at er dot nl
3 years ago
A nice and simple node 2 array I wrote, worth a try ;)

<?php
function getArray($node)
{
   
$array = false;

    if (
$node->hasAttributes())
    {
        foreach (
$node->attributes as $attr)
        {
           
$array[$attr->nodeName] = $attr->nodeValue;
        }
    }

    if (
$node->hasChildNodes())
    {
        if (
$node->childNodes->length == 1)
        {
           
$array[$node->firstChild->nodeName] = $node->firstChild->nodeValue;
        }
        else
        {
            foreach (
$node->childNodes as $childNode)
            {
                if (
$childNode->nodeType != XML_TEXT_NODE)
                {
                   
$array[$childNode->nodeName][] = $this->getArray($childNode);
                }
            }
        }
    }

    return
$array;
}
?>
up
3
fcartegnie
4 years ago
Be careful with formatOutput().

Creating an empty node like this:
createElement('foo','')
instead of
createElement('foo')
will break formatOutput.
up
2
sites.sitesbr.net
1 year ago
How to objetify a DomDocument with hierarchy like:
<root>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
</root>

It's possible to use in object style to retrieve information, as:

<?php
     $theNodeValue
= $aitem->prop1;
?>

Here is the code: one Class and 2 functions.

<?php
class ArrayNode{
       public
$nodeName, $nodeValue;
}

function
getChildNodeElements( $domNode ){
    
$nodes = array();
     for(
$i=0; $i < $domNode->childNodes->length; $i++){
      
$cn = $domNode->childNodes->item($i);
       if(
$cn->nodeType == 1){
          
$nodes[] = $cn;
           }
     }
    return
$nodes;
}

function
getArrayNodes( $domDoc ){
    
$res = array();

       for(
$i=0; $i < $domDoc->childNodes->length; $i++){
      
$cn = $domDoc->childNodes->item($i);
      
# The first is the root tag...
         
if( $cn->nodeType == 1){
              
# But we want it's childNodes.
               
$sub_cn = getChildNodeElements( $cn);
               
# Found the tagName:
               
$baseItemTagName = $sub_cn[0]->nodeName;
                break;
            }
        }

      
$dnl = $domDoc->getElementsByTagName( $baseItemTagName);

       for(
$i=0; $i< $dnl->length; $i++){
         
$arrayNode = new ArrayNode();

     
# Summary
     
$arrayNode->nodeName = $dnl->item($i)->nodeName;
     
$arrayNode->nodeValue = $dnl->item($i)->nodeValue;

     
# Child Nodes
     
$cn = $dnl->item($i)->childNodes;
      for(
$k=0; $k<$cn->length; $k++){
           if(
$cn->item($k)->nodeName == "#text" && trim($cn->item($k)->nodeValue) == "") continue;
          
$arrayNode->{$cn->item($k)->nodeName} = $cn->item($k)->nodeValue;
      }

     
# Attributes
     
$attr = $dnl->item($i)->attributes;
      for(
$k=0; $k < $attr->length; $k++){
           if(!
is_null($attr)){
            if(
$attr->item($k)->nodeName == "#text" && trim($attr->item($k)->nodeValue) == "") continue;
           
$arrayNode->{$attr->item($k)->nodeName} = $attr->item($k)->nodeValue;
           }
      }

     
$res[] = $arrayNode;

       }

     return
$res;
}
?>

To use it:

<?php

 
# First you load a XML in a DomDocument variable.

  
$url = "/path/to/yourxmlfile.xml";
  
$domSrc = file_get_contents($url);
  
$dom = new DomDocument();
  
$dom->loadXML( $domSrc );

 
# Then, you get the ArrayNodes from the DomDocument.

   
$ans = getArrayNodes( $dom );


    for(
$i=0; $i < count( $ans ) ; $i++){

   
$cn $ans[ $i];

   
$info1 $cn->prop1;
   
$info2 $cn->prop2;
   
$info3 $cn->prop3;
     
        
// ...

  
}

?>
up
1
jay at jaygilford dot com
4 years ago
Here's a small function I wrote to get all page links using the DOMDocument which will hopefully be of use to others

<?php
/**
* @author Jay Gilford
*/

/**
* get_links()
*
* @param string $url
* @return array
*/
function get_links($url) {

   
// Create a new DOM Document to hold our webpage structure
   
$xml = new DOMDocument();

   
// Load the url's contents into the DOM
   
$xml->loadHTMLFile($url);

   
// Empty array to hold all links to return
   
$links = array();

   
//Loop through each <a> tag in the dom and add it to the link array
   
foreach($xml->getElementsByTagName('a') as $link) {
       
$links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
    }

   
//Return the links
   
return $links;
}
?>
up
2
admin at beerpla dot net
4 years ago
After seeing many complaints about certain DOMDocument shortcomings, such as bad handling of encodings and always saving HTML fragments with <html>, <head>, and DOCTYPE, I decided that a better solution is needed.

So here it is: SmartDOMDocument. You can find it at http://beerpla.net/projects/smartdomdocument/

Currently, the main highlights are:

- SmartDOMDocument inherits from DOMDocument, so it's very easy to use - just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).

- saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want - it saves HTML without adding that extra garbage that DOMDocument does.

- encoding fix - DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you - just use loadHTML() as you would normally.

- SmartDOMDocument Object As String - you can use a SmartDOMDocument object as a string which will print out its contents.
For example:
<?php
echo "Here is the HTML: $smart_dom_doc";
?>

I'm going to maintain this code and try to fix bugs as they come in.

Enjoy.
up
1
Nick M
3 years ago
You may need to save all or part of a DOMDocument as an XHTML-friendly string, something compliant with both XML and HTML 4. Here's the DOMDocument class extended with a saveXHTML method:

<?php

/**
* XHTML Document
*
* Represents an entire XHTML DOM document; serves as the root of the document tree.
*/
class XHTMLDocument extends DOMDocument {

 
/**
   * These tags must always self-terminate. Anything else must never self-terminate.
   *
   * @var array
   */
 
public $selfTerminate = array(
     
'area','base','basefont','br','col','frame','hr','img','input','link','meta','param'
 
);
 
 
/**
   * saveXHTML
   *
   * Dumps the internal XML tree back into an XHTML-friendly string.
   *
   * @param DOMNode $node
   *         Use this parameter to output only a specific node rather than the entire document.
   */
 
public function saveXHTML(DOMNode $node=null) {
   
    if (!
$node) $node = $this->firstChild;
   
   
$doc = new DOMDocument('1.0');
   
$clone = $doc->importNode($node->cloneNode(false), true);
   
$term = in_array(strtolower($clone->nodeName), $this->selfTerminate);
   
$inner='';
   
    if (!
$term) {
     
$clone->appendChild(new DOMText(''));
      if (
$node->childNodes) foreach ($node->childNodes as $child) {
       
$inner .= $this->saveXHTML($child);
      }
    }
   
   
$doc->appendChild($clone);
   
$out = $doc->saveXML($clone);
   
    return
$term ? substr($out, 0, -2) . ' />' : str_replace('><', ">$inner<", $out);

  }

}

?>

This hasn't been benchmarked, but is probably significantly slower than saveXML or saveHTML and should be used sparingly.
up
1
tloach at gmail dot com
4 years ago
For anyone else who has been having issues with formatOuput not working, here is a work-around:

rather than just doing something like:

<?php
$outXML
= $xml->saveXML();
?>

force it to reload the XML from scratch, then it will format correctly:

<?php
$outXML
= $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
?>
up
0
cmyk777 at gmail dot com
5 years ago
This function may help to debug current dom element:

<?php
function dom_dump($obj) {
    if (
$classname = get_class($obj)) {
       
$retval = "Instance of $classname, node list: \n";
        switch (
true) {
            case (
$obj instanceof DOMDocument):
               
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->saveXML($obj);
                break;
            case (
$obj instanceof DOMElement):
               
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
                break;
            case (
$obj instanceof DOMAttr):
               
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
               
//$retval .= $obj->ownerDocument->saveXML($obj);
               
break;
            case (
$obj instanceof DOMNodeList):
                for (
$i = 0; $i < $obj->length; $i++) {
                   
$retval .= "Item #$i, XPath: {$obj->item($i)->getNodePath()}\n".
"{$obj->item($i)->ownerDocument->saveXML($obj->item($i))}\n";
                }
                break;
            default:
                return
"Instance of unknown class";
        }
    } else {
        return
'no elements...';
    }
    return
htmlspecialchars($retval);
}
?>

Example usage:

<?php
$dom
= new DomDocument();
$dom->load('test.xml');
$body = $dom->documentElement->getElementsByTagName('book');
echo
'<pre>'.dom_dump($body).'<pre>';
?>

Output:

Instance of DOMNodeList, node list:
Item #0, XPath: /library/book[1]
<book isbn="0345342968">
<title>Fahrenheit 451</title>
<author>R. Bradbury</author>
<publisher>Del Rey</publisher>
</book>
Item #1, XPath: /library/book[2]
<book isbn="0048231398">
<title>The Silmarillion</title>
<author>J.R.R. Tolkien</author>
<publisher>G. Allen &amp; Unwin</publisher>
</book>
Item #2, XPath: /library/book[3]
<book isbn="0451524934">
<title>1984</title>
<author>G. Orwell</author>
<publisher>Signet</publisher>
</book>
Item #3, XPath: /library/book[4]
<book isbn="031219126X">
<title>Frankenstein</title>
<author>M. Shelley</author>
<publisher>Bedford</publisher>
</book>
Item #4, XPath: /library/book[5]
<book isbn="0312863551">
<title>The Moon Is a Harsh Mistress</title>
<author>R. A. Heinlein</author>
<publisher>Orb</publisher>
</book>
up
-2
PhilipWayneRollins at gmail dot com
4 years ago
If you want to use the DOMDocument to create xHTML documents here is a simple class

Note this is designed for creating xHTML documents from scratch but could be easily extended to work with xHTML documents. Also this is for xHTML not XML.

<?php
   
class Document
   
{
        public
$doctype;
        public
$head;
        public
$title = 'Sensei Ninja';
        public
$body;
        private
$styles;
        private
$metas;
        private
$scripts;
        private
$document;
       
       
        function
__construct (  )
        {
           
$this->document = new DOMDocument( );
           
$this->head = $this->document->createElement( 'head', ' ' );
           
$this->body = $this->document->createElement( 'body', ' ' );
        }
       
       
        public function
addStyleSheet ( $url, $media='all' )
        {
           
$element = $this->document->createElement( 'link' );
           
$element->setAttribute( 'type', 'text/css' );
           
$element->setAttribute( 'href', $url );
           
$element->setAttribute( 'media', $media );
           
$this->styles[] = $element;
        }
       
       
        public function
addScript ( $url )
        {
           
$element = $this->document->createElement( 'script', ' ' );
           
$element->setAttribute( 'type', 'text/javascript' );
           
$element->setAttribute( 'src', $url );
           
$this->scripts[] = $element;
        }
       
       
        public function
addMetaTag ( $name, $content )
        {
           
$element = $this->document->createElement( 'meta' );
           
$element->setAttribute( 'name', $name );
           
$element->setAttribute( 'content', $content );
           
$this->metas[] = $element;
        }
       
       
        public function
setDescription ( $dec )
        {
           
$this->addMetaTag( 'description', $dec );
        }
       
       
        public function
setKeywords ( $keywords )
        {
           
$this->addMetaTag( 'keywords', $keywords );
        }
       
        public function
createElement ( $nodeName, $nodeValue=null )
        {
          return
$this->document->createElement( $nodeName, $nodeValue );
        }
       
        public function
assemble ( )
        {
           
// Doctype creation
           
$doctype = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML TRANSITIONAL 1.0//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
           
           
// Create the head element
           
$title = $this->document->createElement( 'title', $this->title );
           
// Add stylesheets if needed
           
if ( is_array( $this->styles ))
                foreach (
$this->styles as $element )
                   
$this->head->appendChild( $element );
           
// Add scripts if needed
           
if(  is_array( $this->scripts ))
                foreach (
$this->scripts as $element )
                   
$this->head->appendChild( $element );
           
// Add meta tags if needed
           
if ( is_array( $this->metas ))
                foreach (
$this->metas as $element )
                   
$this->head->appendChild( $element );
           
$this->head->appendChild( $title );
           
           
// Create the document
           
$html = $this->document->createElement( 'html' );
           
$html->setAttribute( 'xmlns', 'http://www.w3.org/1999/xhtml' );
           
$html->setAttribute( 'xml:lang', 'en' );
           
$html->setAttribute( 'lang', 'en' );
           
$html->appendChild( $this->head );
           
$html->appendChild( $this->body );
           
           
           
$this->document->appendChild( $html );
            return
$doctype . $this->document->saveXML( );
        }
       
    }
   
?>

Small example

<?php
        $document
= new Document( );
   
$document->title = 'Hello';
   
$document->addStyleSheet( 'StyleSheets/main.css' );
   
$div = $document->createElement( 'div' );
   
$div->nodeValue = 'Hello, world!';
   
$div->setAttribute( 'style', 'color: red;' );
   
$document->body->appendChild( $div );
   
printf( '%s', $document->assemble( ) );
?>
up
-2
danny dot nunez15 at gmail dot com
9 months ago
A simple function to grab all links in a page.

    function get_links($url) {

        // Create a new DOM Document to hold our webpage structure
        $xml = new DOMDocument();

        // Load the url's contents into the DOM

        $xml->loadHTMLFile($url);

        // Empty array to hold all links to return
        $links = array();

        //Loop through each <a> tag in the dom and add it to the link array
        foreach ($xml->getElementsByTagName('a') as $link) {
            $url = $link->getAttribute('href');
            if (!empty($url)) {
                $links[] = $link->getAttribute('href');
            }
        }

        //Return the links
        return $links;
    }
To Top