downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

MessageFormatter> <Normalizer::isNormalized
Last updated: Fri, 05 Feb 2010

view this page in

Normalizer::normalize

normalizer_normalize

(PHP 5 >= 5.3.0, PECL intl >= 1.0.0)

Normalizer::normalize -- normalizer_normalize Normalizes the input provided and returns the normalized string

Description

Object oriented style

static string Normalizer::normalize ( string $input [, string $form = Normalizer::FORM_C ] )

Procedural style

string normalizer_normalize ( string $input [, string $form = Normalizer::FORM_C ] )

Normalizes the input provided and returns the normalized string

Parameters

input

The input string to normalize

form

One of the normalization forms.

Return Values

The normalized string or NULL if an error occurred.

Examples

Example #1 normalizer_normalize() example

<?php
$char_A_ring 
"\xC3\x85"// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 normalizer_normalize$char_A_ringNormalizer::FORM_C );
$char_2 normalizer_normalize'A' $char_combining_ring_aboveNormalizer::FORM_C );
 
echo 
urlencode($char_1);
echo 
' ';
echo 
urlencode($char_2);
?>

Example #2 OO example

<?php
$char_A_ring 
"\xC3\x85"// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 Normalizer::normalize$char_A_ringNormalizer::FORM_C );
$char_2 Normalizer::normalize'A' $char_combining_ring_aboveNormalizer::FORM_C );
 
echo 
urlencode($char_1);
echo 
' ';
echo 
urlencode($char_2);
?>

The above example will output:

%C3%85 %C3%85

See Also



add a note add a note User Contributed Notes
Normalizer::normalize
akniep at rayo dot info
30-Jul-2009 04:03
Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.
The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

Be sure to have the PHP-Normalizer-extension (intl and icu) installed.

Tipp: You may also want to map the text to lower case before execute matching procedures ...

<?php

function normalizeUtf8String( $s)
{
   
// Normalizer-class missing!
   
if (! class_exists("Normalizer", $autoload = false))
        return
$original_string;
   
   
   
// maps German (umlauts) and other European characters onto two characters before just removing diacritics
   
$s    = preg_replace( '@\x{00c4}@u'    , "AE",    $s );    // umlaut Ä => AE
   
$s    = preg_replace( '@\x{00d6}@u'    , "OE",    $s );    // umlaut Ö => OE
   
$s    = preg_replace( '@\x{00dc}@u'    , "UE",    $s );    // umlaut Ü => UE
   
$s    = preg_replace( '@\x{00e4}@u'    , "ae",    $s );    // umlaut ä => ae
   
$s    = preg_replace( '@\x{00f6}@u'    , "oe",    $s );    // umlaut ö => oe
   
$s    = preg_replace( '@\x{00fc}@u'    , "ue",    $s );    // umlaut ü => ue
   
$s    = preg_replace( '@\x{00f1}@u'    , "ny",    $s );    // ñ => ny
   
$s    = preg_replace( '@\x{00ff}@u'    , "yu",    $s );    // ÿ => yu
   
   
    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
        // exmaple:  Ú => U´,  á => a`
   
$s    = Normalizer::normalize( $s, Normalizer::FORM_D );
   
   
   
$s    = preg_replace( '@\pM@u'        , "",    $s );    // removes diacritics
   
   
   
$s    = preg_replace( '@\x{00df}@u'    , "ss",    $s );    // maps German ß onto ss
   
$s    = preg_replace( '@\x{00c6}@u'    , "AE",    $s );    // Æ => AE
   
$s    = preg_replace( '@\x{00e6}@u'    , "ae",    $s );    // æ => ae
   
$s    = preg_replace( '@\x{0132}@u'    , "IJ",    $s );    // ? => IJ
   
$s    = preg_replace( '@\x{0133}@u'    , "ij",    $s );    // ? => ij
   
$s    = preg_replace( '@\x{0152}@u'    , "OE",    $s );    // Œ => OE
   
$s    = preg_replace( '@\x{0153}@u'    , "oe",    $s );    // œ => oe
   
   
$s    = preg_replace( '@\x{00d0}@u'    , "D",    $s );    // Ð => D
   
$s    = preg_replace( '@\x{0110}@u'    , "D",    $s );    // Ð => D
   
$s    = preg_replace( '@\x{00f0}@u'    , "d",    $s );    // ð => d
   
$s    = preg_replace( '@\x{0111}@u'    , "d",    $s );    // d => d
   
$s    = preg_replace( '@\x{0126}@u'    , "H",    $s );    // H => H
   
$s    = preg_replace( '@\x{0127}@u'    , "h",    $s );    // h => h
   
$s    = preg_replace( '@\x{0131}@u'    , "i",    $s );    // i => i
   
$s    = preg_replace( '@\x{0138}@u'    , "k",    $s );    // ? => k
   
$s    = preg_replace( '@\x{013f}@u'    , "L",    $s );    // ? => L
   
$s    = preg_replace( '@\x{0141}@u'    , "L",    $s );    // L => L
   
$s    = preg_replace( '@\x{0140}@u'    , "l",    $s );    // ? => l
   
$s    = preg_replace( '@\x{0142}@u'    , "l",    $s );    // l => l
   
$s    = preg_replace( '@\x{014a}@u'    , "N",    $s );    // ? => N
   
$s    = preg_replace( '@\x{0149}@u'    , "n",    $s );    // ? => n
   
$s    = preg_replace( '@\x{014b}@u'    , "n",    $s );    // ? => n
   
$s    = preg_replace( '@\x{00d8}@u'    , "O",    $s );    // Ø => O
   
$s    = preg_replace( '@\x{00f8}@u'    , "o",    $s );    // ø => o
   
$s    = preg_replace( '@\x{017f}@u'    , "s",    $s );    // ? => s
   
$s    = preg_replace( '@\x{00de}@u'    , "T",    $s );    // Þ => T
   
$s    = preg_replace( '@\x{0166}@u'    , "T",    $s );    // T => T
   
$s    = preg_replace( '@\x{00fe}@u'    , "t",    $s );    // þ => t
   
$s    = preg_replace( '@\x{0167}@u'    , "t",    $s );    // t => t
   
    // remove all non-ASCii characters
   
$s    = preg_replace( '@[^\0-\x80]@u'    , "",    $s );
   
   
   
// possible errors in UTF8-regular-expressions
   
if (empty($s))
        return
$original_string;
    else
        return
$s;
}
?>

The above function is mainly based on the following article:
http://ahinea.com/en/tech/accented-translate.html

MessageFormatter> <Normalizer::isNormalized
Last updated: Fri, 05 Feb 2010
 
 
show source | credits | stats | sitemap | contact | advertising | mirror sites