正则表达式中的 /isU 的解释

Pattern Modifiers

The current possible PCRE modifiers are listed below. The names in parentheses refer to internal PCRE names for these modifiers. Spaces and newlines are ignored in modifiers, other characters cause error.

i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper and lower case letters.
m (PCRE_MULTILINE)
By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
x (PCRE_EXTENDED)
If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x modifier, and makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.
e (PREG_REPLACE_EVAL)
If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.

Only preg_replace() uses this modifier; it is ignored by other PCRE functions.

A (PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
D (PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.
S
When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.
U (PCRE_UNGREEDY)
This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by ?. It is not compatible with Perl. It can also be set by a (?Umodifier setting within the pattern or by a question mark behind a quantifier (e.g. .*?).
X (PCRE_EXTRA)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. There are at present no other features controlled by this modifier.
J (PCRE_INFO_JCHANGED)
The (?J) internal option setting changes the local PCRE_DUPNAMES option. Allow duplicate names for subpatterns.
u (PCRE8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
示例代码
 
 1 <?php  
 2 echo '<pre>';  
 3  
 4 $str = '<ul>hello world<li>hi</li><li>hello</li></ul>';  
 5 $pattern = '~<li>.*</li>~';  
 6 preg_match($pattern,$str,$matches);  
 7 var_dump($matches);  
 8 /* 
 9 array(1) { 
10   [0]=> 
11   string(25) "<li>hi</li><li>hello</li>" 
12 
13 */ 
14  
15 $pattern1 = '~<li>.*?</li>~';  
16 preg_match($pattern1,$str,$matches1);  
17 var_dump($matches1);  
18 /* 
19 array(1) { 
20   [0]=> 
21   string(11) "<li>hi</li>" 
22 
23 */ 
24  
25 $pattern2 = '~<li>.*</li>~U';  
26 preg_match($pattern2,$str,$matches2);  
27 var_dump($matches2);  
28 /* 
29 array(1) { 
30   [0]=> 
31   string(11) "<li>hi</li>" 
32 
33 */ 
34  
35 $pattern3 = '~<li>.*?</li>~U';  
36 preg_match($pattern3,$str,$matches3);  
37 var_dump($matches3);  
38 /* 
39 array(1) { 
40   [0]=> 
41   string(25) "<li>hi</li><li>hello</li>" 
42 
43 */
 

事实证明,加了U,原来是贪婪匹配的变成非贪婪匹 配,非贪婪匹配的却变成了贪婪匹配。

posted on 2010-12-31 17:44 lateCpp 阅读(516) 评论(0)  编辑 收藏 引用 所属分类: PHP


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   博问   Chat2DB   管理


导航

统计

常用链接

留言簿

随笔分类

文章分类

文章档案

搜索

最新评论