This function takes a string of HTML and runs it through HTML Tidy. It outputs a string of XHTML. This is intended for processing entire pages by default. You can supply an array to the second parameter of this function and it will override the default settings. Of course just changing the array inside the function will change the default settings. :D
There is a cheat sheet for the HTML Tidy options located here: http://tidy.sourceforge.net/docs/quickref.html There are a few I didn't put in the default array. ;) only a few.
example use of this function:
<?php
$fileAsString = file_get_contents( 'path/to/file.html' );
$cleanOutput = cleaning( $fileAsString );
echo $cleanOutput;
?>
And here it is. I hope it helps you.
<?php
// Tested on PHP 5.3.5
/* $what_to_clean is a string, $tidy_config is an array of options. $tidy_config is optional. */
function cleaning($what_to_clean, $tidy_config='' ) {
$config = array(
'show-body-only' => false,
'clean' => true,
'char-encoding' => 'utf8',
'add-xml-decl' => true,
'add-xml-space' => true,
'output-html' => false,
'output-xml' => false,
'output-xhtml' => true,
'numeric-entities' => false,
'ascii-chars' => false,
'doctype' => 'strict',
'bare' => true,
'fix-uri' => true,
'indent' => true,
'indent-spaces' => 4,
'tab-size' => 4,
'wrap-attributes' => true,
'wrap' => 0,
'indent-attributes' => true,
'join-classes' => false,
'join-styles' => false,
'enclose-block-text' => true,
'fix-bad-comments' => true,
'fix-backslash' => true,
'replace-color' => false,
'wrap-asp' => false,
'wrap-jste' => false,
'wrap-php' => false,
'write-back' => true,
'drop-proprietary-attributes' => false,
'hide-comments' => false,
'hide-endtags' => false,
'literal-attributes' => false,
'drop-empty-paras' => true,
'enclose-text' => true,
'quote-ampersand' => true,
'quote-marks' => false,
'quote-nbsp' => true,
'vertical-space' => true,
'wrap-script-literals' => false,
'tidy-mark' => true,
'merge-divs' => false,
'repeated-attributes' => 'keep-last',
'break-before-br' => true,
);
if( $tidy_config == '' ) {
$tidy_config = &$config;
}
$tidy = new tidy();
$out = $tidy->repairString($what_to_clean, $tidy_config, 'UTF8');
unset($tidy);
unset($tidy_config);
return($out);
}
?>
Tidy example
This simple example shows basic Tidy usage.
Example #1 Basic Tidy usage
<?php
ob_start();
?>
<html>a html document</html>
<?php
$html = ob_get_clean();
// Specify configuration
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 200);
// Tidy
$tidy = new tidy;
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
// Output
echo $tidy;
?>
matthewkastor at gmail dot com
10-Aug-2011 07:00
