ConFoo: Call for paper is now Open


Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing character, but not (by default) newline. If the PCRE_DOTALL option is set, then dots match newlines as well. The handling of dot is entirely independent of the handling of circumflex and dollar, the only relationship being that they both involve newline characters. Dot has no special meaning in a character class.

\C can be used to match single byte. It makes sense in UTF-8 mode where full stop matches the whole character which can consist of multiple bytes.

add a note add a note

User Contributed Notes 1 note

3 years ago

        preg_match_all("/<img.*>/", $htmlfile, $match);

Since PCRE_DOTALL is not used, this pattern is expected to NOT make matches across multiple lines.  However, in somecases it can, depending on the PCRE default settings and your data ($htmlfile).  The problem is that some are set to recognize NEWLINES differently.
To fix this use,

        preg_match_all("/(*ANY)<img.*>/", $htmlfile, $match);

Now, any character that could possibly be seen as a newline will be interpreted as a newline by the PCRE.
NOTE: This pattern has been available since PCRE version 7.3
To Top