XML in PHP (version 5+)

PHP 5 has many XML extensions. Each one has its own advantages and disadvantages. Which one to select for our task. Suppose you know some XML techniques and can dive into the XML extension whichever is suitable then you first need to know all the available extensions.

I tried to collect the usages and differences in each widely used XML extensions. Please add your comments if you find something incorrect or something big missing. I feel it is not complete!

XML libraries/extensions are classified as tree based (DOM, SimpleXML), streaming (XMLReader/Writer, XML parser, XSL and SAX), event based(SAX) and transformation (XSL). We can also classify XML extensions as Tree based and stream based. In tree based parser we can move to top to bottom and again bottom to top in the same processing. In stream based and event based parser, we can move from top to bottom only. and for rechecking again the already read data, we need to traverse again from top to bottom.

PHP has these XML extensions which is mostly in use:

DOM - based on W3C standards and it is Tree based parser.
It loads whole document in RAM before parsing. So, for very large document memory issue can occur. It is powerful as it support many functionalities. It provides complete interface to all aspects of the XML specification. It follows the W3C standards so programmers coming from different languages, such as JavaScript, can easily work on it. It support all the function name and property according to W3C standard. With all these, price is that it is little difficult to work with and can be tough for very large document due to memory issue. It can work with XML schema, RelaxNG and DTD also. It supports namespaces well.

SimpleXML - really good for simple document but not good for very complex XML documents. This is PHP's gift to new and experienced developers alike when they do not have to interact with complex job in XML such as interacting with comments and processing instructions. When you do not know the XML file format in advance then also this may not be suitable. It only supports XML Schema for validation.

XMLReader and XMLWriter - Reader and Writer name should be clear. XMLReader and XMLWriter consume low memory (RAM) compared to SimpleXML and DOM. This extension does not keep whole document in memory so it does not consume RAM as much as Tree based extensions (DOM and SimpleXML). It is better to use than SAX. Unlike, DOM we can validate small portion of document at parsing time and stop further processing on error. DOM loads whole document first then validate it. It seems this also support all three validation methods: DTD, RelaxNG and XML Schema. It is a stream based parser.

XML Parser: Event based parser. It gives us chance to do something on each event. It sees the document as events and use callback function for each event. This also does not load the entire document into memory. It allows you to parse but does not validate the XML document.

XSL - XSLT support is provided in PHP5 using libxslt. Earlier it had sablotron and domxml. Using XSLT support, we can create XML document based on certain criteria from another document, which is using XML and StyleSheet (XSL). PHP provides a way to use PHP function with from XSL stylesheet during transformation. This provides immense power during transformation.

SAX - Simple API for XML (SAX) is an old one from PHP 4, still in support. Better to use XMLReader/Writer. Sometime it can be more efficient. It is also a stream based parser means it does not load whole document into memory. It treats each opening and closing tag of elements as an event.

After all the above about DOM and SimpleXML, here is another one for DOM and SimpleXML. These two supports xPath also.

<?php
$s = Simplexml_load_file('address.xml');
$emails = $s->xpath('/address/person/email');
foreach ($emails as $email) {}
?>

With DOM, we cannot process the xpath directly on DOM object itself. We create DOMXPath object first.

<?php
$dom = new DOMDocument;
$dom->load('address.xml');
$xpath = new DOMXPath($dom);
$emails = $xpath->query('/address/person/email');
?>

It returns the DOM object, and can be parsed the DOM way.

DOM again: DOM classes can be extended with a class.

class articles extends DOMDocument {

}

DOM also allow us to parse not so well-formed HTML document.
<?php
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.php.net');
$title = $dom->getElementsByTagName('title');
echo $title->item(0)->textContent;
?>

DOM document can save content as HTML also.

Reference:
Various sources.

I hope you find this guide as enjoyable to read as I have found to create. Subscribe to feed or join me on Facebook for a regular update.

  • # 1 - by Pete

    Thanks for the fairly clear explanation of PHP’s options. I’m never sure which one to use, but generally stick to simplePHP because I find it’s programmatic interface to be relatively clear. It amazes me that we live in such a connected world: the technology used in 3rd world India is the same as that used in more advanced countries!

Comments are open for an year period. Please, write here on Facebook page.