如何获取汉字的拼音:
http://blog.csdn.net/cpu88/archive/2004/11/26/195315.aspx
步骤:
//得到拼音(包括多音)
A: 用输入法生成器(win2000)"C:\Program Files\Windows NT\Accessories\Imegen.exe"
逆转换拼音输入法文件C:\WINNT\SYSTEM32\WINPY.MB
会生成一个C:\WINNT\SYSTEM32\WINPY.txt文件(简称 WINPY.txt文件)
B: WINPY.txt文件里面是 汉字拼音列表5万多条 除去词组 有汉字2万多个(含多音)
C: 汉字可以转换成某中编码可以自己构造编码方法,保证一个汉字对应一个编码 简称编码方法)
如 byte[] uniCode = new String(temp).getBytes(“GB2312“);
将WINPY.txt里面所有的汉字变成编码。得到汉字编码 拼音对应表(简称汉字编码表)
XXXX0,a //XXXX0是某个汉字的编码
XXYX2,o //XXYX2是某个汉字的编码
D: 汉字编码表按编码排序,编码表按编码大小排序。
编码表分组(方便查询 ) 而且得到分组的标志。
E: 查询汉字拼音 将汉字进行编码(按自己的编码方法)。
用编码在编码表中查询就可以得到拼音,查询时在编码表中的某个分组中查询,而不是在所有编码中查询。速度很快。
//得到首字符 如'北京' 得到 'bj' '呆子'得到 'd[a]z ' //多音
//排序 有了拼音 就可以按一些常见的排序方法排序
如何在C++中集成Lua脚本(LuaPlus篇):
http://ly4cn.teeta.com/blog/data/44939.html
据说无法调用虚函数。有点晕。有空再看看。
http://luaplus.org/
字符串到其他格式的转换:
CString timestr = "2000年04月05日";
int a,b,c ;
sscanf(timestr.GetBuffer(timestr.GetLength()),"%d年%d月%d日",&a,&b,&c);
CTime time(a,b,c,0,0,0);
ANSI兼容,我太孤陋寡闻了...
RMI for c++?:
http://www.codeproject.com/threads/RMI_For_Cpp.asp
反正目前没这个需要。
声音引擎,audiere,支持跨平台:
似乎挺不错。
http://www.cppblog.com/gogoplayer/archive/2006/11/29/15763.html
http://sourceforge.net/projects/audiere/
wxWidget的wxSound支持windows和unix。
其它关于sound:
http://www.opensound.com/oss.html
http://www.libsdl.org/
Why You Should Turn Down That Job Offer
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005602&pageNumber=1
You will be the fifth person to have held the job in the past three years.
Why is this job vacant?
Is the turnover rate high for this position?
What's typically the next career step for those with this job?
You will clash with the corporate culture.
You will be bored -- or overwhelmed -- in the role.
You will not be able to move forward.
may not be reason enough to turn down a job:
You will earn less than you did before.
You will be in the car for two hours each day.
You will receive a “demotion” in title.
计算Int最大最小值
http://www.cppblog.com/Winux32/archive/2006/12/01/15853.html通常我们会使用CRT提供给我们的一个头文件<limits.h>中预定义宏INT_MAX, INT_MIN, UINT_MAX来定义int的最大最小值下边给出由计算得出这些值的方法,其他数据类型同理
unsigned int GetUnsignedIntMax()
{
return ~ 0 ;
}
signed int GetSignedIntMax()
{
return (static_cast < unsigned int > ( ~ 0 )) >> 1 ;
}
signed int GetSignedIntMin()
{
signed int i = - 1 ;
if (i & 1 )
return - ( (static_cast < unsigned int > ( ~ 0 )) >> 1 ) - 1 ;
else
return - ( (static_cast < unsigned int > ( ~ 0 )) >> 1 );
}
稍微解释一下,前两个没有什么好说的,最后一个要考虑是two complement还是one complement
如果是前者,有这样一个计算公式,~X + 1= -X,即一个数取反加一表示这个数所对应的负数
How Skype & Co. get round firewallscool..
http://www.heise-security.co.uk/articles/82481
先Copy过来再说.
The hole trick
How Skype & Co. get round firewalls
Peer-to-peer software applications are a
network administrator's nightmare. In order to be able
to exchange packets with their counterpart as directly
as possible they use subtle tricks to punch holes in
firewalls, which shouldn't actually be letting in
packets from the outside world.
Increasingly, computers are positioned behind firewalls
to protect systems from internet threats. Ideally, the
firewall function will be performed by a router, which
also translates the PC's local network address to the
public IP address (Network Address Translation, or
NAT). This means an attacker cannot directly adress the
PC from the outside - connections have to be
established from the inside.
This is of course a problem when two computers behind
NAT firewalls require to talk directly to each other -
if, for example, their users want to call each other
using Voice over IP (VoIP). The dilemma is clear -
whichever party calls the other, the recipient's
firewall will decline the apparent attack and will
simply discard the data packets. The telephone call
doesn't happen. Or at least that's what a network
administrator would expect.
Punched
But anyone who has used the popular internet telephony
software Skype knows that it works as smoothly behind a
NAT firewall as it does if the PC is connected directly
to the internet. The reason for this is that the
inventors of Skype and similar software have come up
with a solution.
Naturally every firewall must also let packets through
into the local network - after all the user wants to
view websites, read e-mails, etc. The firewall must
therefore forward the relevant data packets from
outside, to the workstation computer on the
LAN. However it only does so, when it is convinced that
a packet represents the response to an outgoing data
packet. A NAT router therefore keeps tables of which
internal computer has communicated with which external
computer and which ports the two have used.
The trick used by VoIP software consists of persuading
the firewall that a connection has been
established, to which it should allocate subsequent
incoming data packets. The fact that audio data for
VoIP is sent using the connectionless UDP protocol
acts to Skype's advantage. In contrast to TCP, which
includes additional connection information in each
packet, with UDP, a firewall sees only the addresses
and ports of the source and destination systems. If,
for an incoming UDP packet, these match an NAT table
entry, it will pass the packet on to an internal
computer with a clear conscience.
Switching
The switching server, with which both ends of a call
are in constant contact, plays an important role when
establishing a connection using Skype. This occurs via
a TCP connection, which the clients themselves
establish. The Skype server therefore always knows
under what address a Skype user is currently available
on the internet. Where possible the actual telephone
connections do not run via the Skype server; rather,
the clients exchange data directly.
Let's assume that Alice wants to call her friend
Bob. Her Skype client tells the Skype server that she
wants to do so. The Skype server already knows a bit
about Alice. From the incoming query it sees that Alice
is currently registered at the IP address 1.1.1.1 and a
quick test reveals that her audio data always comes
from UDP port 1414. The Skype server passes this
information on to Bob's Skype client, which, according
to its database, is currently registered at the IP
address 2.2.2.2 and which, by preference uses UDP port
2828.
Bob's Skype program then punches a hole in its own
network firewall: It sends a UDP packet to 1.1.1.1 port
1414. This is discarded by Alice's firewall, but
Bob's firewall doesn't know that. It now thinks that
anything which comes from 1.1.1.1 port 1414 and is
addressed to Bob's IP address 2.2.2.2 and port 2828 is
legitimate - it must be the response to the query which
has just been sent.
Now the Skype server passes Bob's coordinates on to
Alice, whose Skype application attempts to contact Bob
at 2.2.2.2:2828. Bob's firewall sees the recognised
sender address and passes the apparent response on to
Bob's PC - and his Skype phone rings.
Doing the rounds
This description is of course somewhat simplified - the
details depend on the specific properties of the
firewalls used. But it corresponds in principle to our
observations of the process of establishing a
connection between two Skype clients, each of which was
behind a Linux firewall. The firewalls were configured
with NAT for a LAN and permitted outgoing UDP traffic.
Linux' NAT functions have the VoIP friendly property
of, at least initially, not changing the ports of
outgoing packets. The NAT router merely replaces the
private, local IP address with its own address - the
UDP source port selected by Skype is retained. Only
when multiple clients on the local network use the same
source port does the NAT router stick its oar in and
reset the port to a previously unused value. This is
because each set of two IP addresses and ports must be
able to be unambiguously assigned to a connection
between two computers at all times. The router will
subsequently have to reconstruct the internal IP
address of the original sender from the response
packet's destination port.
Other NAT routers will try to assign ports in a
specific range, for example ports from 30,000 onwards,
and translate UDP port 1414, if possible, to
31414. This is, of course, no problem for Skype - the
procedure described above continues to work in a
similar manner without limitations.
It becomes a little more complicated if a firewall
simply assigns ports in sequence, like Check Point's
FireWall-1: the first connection is assigned 30001,
the next 30002, etc. The Skype server knows that Bob is
talking to it from port 31234, but the connection to
Alice will run via a different port. But even here
Skype is able to outwit the firewall. It simply runs
through the ports above 31234 in sequence, hoping at
some point to stumble on the right one. But if this
doesn't work first go, Skype doesn't give up. Bob's
Skype opens a new connection to the Skype server, the
source port of which is then used for a further
sequence of probes.
Nevertheless, in very active networks Alice may not
find the correct, open port. The same also applies for
a particular type of firewall, which assigns every new
connection to a random source port. The Skype server is
then unable to tell Alice where to look for a suitable
hole in Bob's firewall.
However, even then, Skype doesn't give up. In such
cases a Skype server is then used as a relay. It
accepts incoming connections from both Alice and Bob
and relays the packets onwards. This solution is always
possible, as long as the firewall permits outgoing UDP
traffic. It involves, however, an additional load on
the infrastructure, because all audio data has to run
through Skype's servers. The extended packet transmission
times can also result in an unpleasant delay.
Use of the procedure described above is not limited to
Skype and is known as "UDP hole punching". Other
network services such as the Hamachi gaming VPN
application, which relies on peer-to-peer communication
between computers behind firewalls, use similar
procedures. A more developed form has even made it to
the rank of a standard - RFC 3489 "Simple Traversal of UDP
through NAT" (STUN) describes a protocol which with two
STUN clients can get around the restrictions of NAT
with the help of a STUN server in many cases. The
draft Traversal Using Relay NAT (TURN) protocol describes a possible
standard for relay servers.
DIY hole punching
With a few small utilities, you can try out UDP hole
punching for yourself. The tools required, hping2 and
netcat, can be found in most Linux
distributions. Local is a computer behind a
Linux firewall (local-fw) with a stateful firewall
which only permits outgoing (UDP) connections. For
simplicity, in our test the test computer
remote was connected directly to the internet
with no firewall.
Firstly start a UDP listener on UDP port 14141 on the
local/1 console behind the firewall:
local/1# nc -u -l -p 14141
An external computer "remote" then attempts to contact it.
remote# echo "hello" | nc -p 53 -u local-fw 14141
However, as expected nothing is received on
local/1 and, thanks to the firewall, nothing
is returned to remote. Now on a second
console, local/2, hping2, our universal tool
for generating IP packets, punches a hole in the
firewall:
local/2# hping2 -c 1 -2 -s 14141 -p 53 remote
As long as remote is behaving itself, it will
send back a "port unreachable" response via ICMP -
however this is of no consequence. On the second
attempt
remote# echo "hello" | nc -p 53 -u local-fw 14141
the netcat listener on console local/1 then
coughs up a "hello" - the UDP packet from outside has
passed through the firewall and arrived at the computer
behind it.
Network administrators who do not appreciate this sort
of hole in their firewall and are worried about abuse,
are left with only one option - they have to block
outgoing UDP traffic, or limit it to essential
individual cases. UDP is not required for normal
internet communication anyway - the web, e-mail and
suchlike all use TCP. Streaming protocols may, however,
encounter problems, as they often use UDP because of
the reduced overhead.
Astonishingly, hole punching also works with TCP. After
an outgoing SYN packet the firewall / NAT router will
forward incoming packets with suitable IP addresses and
ports to the LAN even if they fail to confirm, or
confirm the wrong sequence number (ACK). Linux
firewalls at least, clearly fail to evaluate this
information consistently. Establishing a TCP connection
in this way is, however, not quite so simple, because
Alice does not have the sequence number sent in Bob's
first packet. The packet containing this information
was discarded by her firewall.
HashCode推荐算法
《Effective Java》
1,int result = 17;
2,对每个重要数据成员(Equals中用到的),计算int c:
boolean : c = f?0:1;
byte,int,char,short : c = (int)f
long c = (int)(f^(f>>>32));
float : c = Float.floatToIntBits(f);
double : long l = (int)(f^(f>>>32));c = (int)(l^(l>>>32));
其它reference : c = f.hashCode();
3,result = 37*result+c;
正则表达式语法
翻遍了MSDN2002中关于正则表达式的文章,居然找不到正则表达式语法,气死我了。
贴过来。转载自某人的转载的转载的转载....
正则表达式(regular expression)描述了一种字符串匹配的模式,可以用来检查一个串是否含有某种子串、将匹配的子串做替换或者从某个串中取出符合某个条件的子串等。
列目录时, dir *.txt或ls *.txt中的*.txt就
不是一个正则表达式,因为这里*与正则式的*的含义是不同的。
为便于理解和记忆,先从一些概念入手,所有特殊字符或字符组合有一个总表在后面,最后一些例子供理解相应的概念。
正则表达式
是由普通字符(例如字符 a 到 z)以及特殊字符(称为元字符)组成的文字模式。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进行匹配。
可以通过在一对分隔符之间放入表达式模式的各种组件来构造一个正则表达式,即/expression/
普通字符
由所有那些未显式指定为元字符的打印和非打印字符组成。这包括所有的大写和小写字母字符,所有数字,所有标点符号以及一些符号。
非打印字符
字符 | 含义 |
\cx | 匹配由x指明的控制字符。例如, \cM 匹配一个 Control-M 或回车符。x 的值必须为 A-Z 或 a-z 之一。否则,将 c 视为一个原义的 'c' 字符。 |
\f | 匹配一个换页符。等价于 \x0c 和 \cL。 |
\n | 匹配一个换行符。等价于 \x0a 和 \cJ。 |
\r | 匹配一个回车符。等价于 \x0d 和 \cM。 |
\s | 匹配任何空白字符,包括空格、制表符、换页符等等。等价于 [ \f\n\r\t\v]。 |
\S | 匹配任何非空白字符。等价于 [^ \f\n\r\t\v]。 |
\t | 匹配一个制表符。等价于 \x09 和 \cI。 |
\v | 匹配一个垂直制表符。等价于 \x0b 和 \cK。 |
特殊字符
所谓特殊字符,就是一些有特殊含义的字符,如上面说的"*.txt"中的*,简单的说就是表示任何字符串的意思。如果要查找文件名中有*的文件,则需要对*进行转义,即在其前加一个\。ls \*.txt。正则表达式有以下特殊字符。
特别字符 | 说明 |
$ | 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性,则 $ 也匹配 '\n' 或 '\r'。要匹配 $ 字符本身,请使用 \$。 |
( ) | 标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符,请使用 \( 和 \)。 |
* | 匹配前面的子表达式零次或多次。要匹配 * 字符,请使用 \*。 |
+ | 匹配前面的子表达式一次或多次。要匹配 + 字符,请使用 \+。 |
. | 匹配除换行符 \n之外的任何单字符。要匹配 .,请使用 \。 |
[ | 标记一个中括号表达式的开始。要匹配 [,请使用 \[。 |
? | 匹配前面的子表达式零次或一次,或指明一个非贪婪限定符。要匹配 ? 字符,请使用 \?。 |
\ | 将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如, 'n' 匹配字符 'n'。'\n' 匹配换行符。序列 '\\' 匹配 "\",而 '\(' 则匹配 "("。 |
^ | 匹配输入字符串的开始位置,除非在方括号表达式中使用,此时它表示不接受该字符集合。要匹配 ^ 字符本身,请使用 \^。 |
{ | 标记限定符表达式的开始。要匹配 {,请使用 \{。 |
| | 指明两项之间的一个选择。要匹配 |,请使用 \|。 |
构造正则表达式的方法和创建数学表达式的方法一样。也就是用多种元字符与操作符将小的表达式结合在一起来创建更大的表达式。正则表达式的组件可以是单个的字符、字符集合、字符范围、字符间的选择或者所有这些组件的任意组合。
限定符
限定符用来指定正则表达式的一个给定组件必须要出现多少次才能满足匹配。有*或+或?或{n}或{n,}或{n,m}共6种。
*、+和?限定符都是贪婪的,因为它们会尽可能多的匹配文字,只有在它们的后面加上一个?就可以实现非贪婪或最小匹配。
正则表达式的限定符有:
字符 | 描述 |
* | 匹配前面的子表达式零次或多次。例如,zo* 能匹配 "z" 以及 "zoo"。* 等价于{0,}。 |
+ | 匹配前面的子表达式一次或多次。例如,'zo+' 能匹配 "zo" 以及 "zoo",但不能匹配 "z"。+ 等价于 {1,}。 |
? | 匹配前面的子表达式零次或一次。例如,"do(es)?" 可以匹配 "do" 或 "does" 中的"do" 。? 等价于 {0,1}。 |
{n} | n 是一个非负整数。匹配确定的 n 次。例如,'o{2}' 不能匹配 "Bob" 中的 'o',但是能匹配 "food" 中的两个 o。 |
{n,} | n 是一个非负整数。至少匹配n 次。例如,'o{2,}' 不能匹配 "Bob" 中的 'o',但能匹配 "foooood" 中的所有 o。'o{1,}' 等价于 'o+'。'o{0,}' 则等价于 'o*'。 |
{n,m} | m 和 n 均为非负整数,其中n <= m。最少匹配 n 次且最多匹配 m 次。例如,"o{1,3}" 将匹配 "fooooood" 中的前三个 o。'o{0,1}' 等价于 'o?'。请注意在逗号和两个数之间不能有空格。 |
定位符
用来描述字符串或单词的边界,^和$分别指字符串的开始与结束,\b描述单词的前或后边界,\B表示非单词边界。
不能对定位符使用限定符。选择
用圆括号将所有选择项括起来,相邻的选择项之间用|分隔。但用圆括号会有一个副作用,是相关的匹配会被缓存,此时可用?:放在第一个选项前来消除这种副作用。
其中?:是非捕获元之一,还有两个非捕获元是?=和?!,这两个还有更多的含义,前者为正向预查,在任何开始匹配圆括号内的正则表达式模式的位置来匹配搜索字符串,后者为负向预查,在任何开始不匹配该正则表达式模式的位置来匹配搜索字符串。
后向引用
对一个正则表达式模式或部分模式两边添加圆括号将导致相关匹配存储到一个临时缓冲区中,所捕获的每个子匹配都按照在正则表达式模式中从左至右所遇到的
内容存储。存储子匹配的缓冲区编号从 1 开始,连续编号直至最大 99 个子表达式。每个缓冲区都可以使用 '\n' 访问,其中 n
为一个标识特定缓冲区的一位或两位十进制数。
可以使用非捕获元字符 '?:', '?=', or '?!' 来忽略对相关匹配的保存。
各种操作符的运算优先级
相同优先级的从左到右进行运算,不同优先级的运算先高后低。各种操作符的优先级从高到低如下:
操作符 | 描述 |
\ | 转义符 |
(), (?:), (?=), [] | 圆括号和方括号 |
*, +, ?, {n}, {n,}, {n,m} | 限定符 |
^, $, \anymetacharacter | 位置和顺序 |
| | “或”操作 |
全部符号解释
字符 | 描述 |
\ | 将下一个字符标记为一个特殊字符、或一个原义字符、或一个 向后引用、或一个八进制转义符。例如,'n' 匹配字符 "n"。'\n' 匹配一个换行符。序列 '\\' 匹配 "\" 而 "\(" 则匹配 "("。 |
^ | 匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性,^ 也匹配 '\n' 或 '\r' 之后的位置。 |
$ | 匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性,$ 也匹配 '\n' 或 '\r' 之前的位置。 |
* | 匹配前面的子表达式零次或多次。例如,zo* 能匹配 "z" 以及 "zoo"。* 等价于{0,}。 |
+ | 匹配前面的子表达式一次或多次。例如,'zo+' 能匹配 "zo" 以及 "zoo",但不能匹配 "z"。+ 等价于 {1,}。 |
? | 匹配前面的子表达式零次或一次。例如,"do(es)?" 可以匹配 "do" 或 "does" 中的"do" 。? 等价于 {0,1}。 |
{n} | n 是一个非负整数。匹配确定的 n 次。例如,'o{2}' 不能匹配 "Bob" 中的 'o',但是能匹配 "food" 中的两个 o。 |
{n,} | n 是一个非负整数。至少匹配n 次。例如,'o{2,}' 不能匹配 "Bob" 中的 'o',但能匹配 "foooood" 中的所有 o。'o{1,}' 等价于 'o+'。'o{0,}' 则等价于 'o*'。 |
{n,m} | m 和 n 均为非负整数,其中n <= m。最少匹配 n 次且最多匹配 m 次。例如,"o{1,3}" 将匹配 "fooooood" 中的前三个 o。'o{0,1}' 等价于 'o?'。请注意在逗号和两个数之间不能有空格。 |
? | 当
该字符紧跟在任何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m})
后面时,匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串,而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如,对于字符串
"oooo",'o+?' 将匹配单个 "o",而 'o+' 将匹配所有 'o'。 |
. | 匹配除 "\n" 之外的任何单个字符。要匹配包括 '\n' 在内的任何字符,请使用象 '[.\n]' 的模式。 |
(pattern) | 匹配 pattern 并获取这一匹配。所获取的匹配可以从产生的 Matches 集合得到,在VBScript 中使用 SubMatches 集合,在JScript 中则使用 $0…$9 属性。要匹配圆括号字符,请使用 '\(' 或 '\)'。 |
(?:pattern) | 匹
配 pattern 但不获取匹配结果,也就是说这是一个非获取匹配,不进行存储供以后使用。这在使用 "或" 字符 (|)
来组合一个模式的各个部分是很有用。例如, 'industr(?:y|ies) 就是一个比 'industry|industries'
更简略的表达式。 |
(?=pattern) | 正
向预查,在任何匹配 pattern
的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如,'Windows
(?=95|98|NT|2000)' 能匹配 "Windows 2000" 中的 "Windows" ,但不能匹配 "Windows 3.1"
中的
"Windows"。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹配的搜索,而不是从包含预查的字符之后开始。 |
(?!pattern) | 负
向预查,在任何不匹配 pattern
的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如'Windows
(?!95|98|NT|2000)' 能匹配 "Windows 3.1" 中的 "Windows",但不能匹配 "Windows 2000"
中的 "Windows"。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹配的搜索,而不是从包含预查的字符之后开始 |
x|y | 匹配 x 或 y。例如,'z|food' 能匹配 "z" 或 "food"。'(z|f)ood' 则匹配 "zood" 或 "food"。 |
[xyz] | 字符集合。匹配所包含的任意一个字符。例如, '[abc]' 可以匹配 "plain" 中的 'a'。 |
[^xyz] | 负值字符集合。匹配未包含的任意字符。例如, '[^abc]' 可以匹配 "plain" 中的'p'。 |
[a-z] | 字符范围。匹配指定范围内的任意字符。例如,'[a-z]' 可以匹配 'a' 到 'z' 范围内的任意小写字母字符。 |
[^a-z] | 负值字符范围。匹配任何不在指定范围内的任意字符。例如,'[^a-z]' 可以匹配任何不在 'a' 到 'z' 范围内的任意字符。 |
\b | 匹配一个单词边界,也就是指单词和空格间的位置。例如, 'er\b' 可以匹配"never" 中的 'er',但不能匹配 "verb" 中的 'er'。 |
\B | 匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er',但不能匹配 "never" 中的 'er'。 |
\cx | 匹配由 x 指明的控制字符。例如, \cM 匹配一个 Control-M 或回车符。x 的值必须为 A-Z 或 a-z 之一。否则,将 c 视为一个原义的 'c' 字符。 |
\d | 匹配一个数字字符。等价于 [0-9]。 |
\D | 匹配一个非数字字符。等价于 [^0-9]。 |
\f | 匹配一个换页符。等价于 \x0c 和 \cL。 |
\n | 匹配一个换行符。等价于 \x0a 和 \cJ。 |
\r | 匹配一个回车符。等价于 \x0d 和 \cM。 |
\s | 匹配任何空白字符,包括空格、制表符、换页符等等。等价于 [ \f\n\r\t\v]。 |
\S | 匹配任何非空白字符。等价于 [^ \f\n\r\t\v]。 |
\t | 匹配一个制表符。等价于 \x09 和 \cI。 |
\v | 匹配一个垂直制表符。等价于 \x0b 和 \cK。 |
\w | 匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'。 |
\W | 匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'。 |
\xn | 匹配 n,其中 n 为十六进制转义值。十六进制转义值必须为确定的两个数字长。例如,'\x41' 匹配 "A"。'\x041' 则等价于 '\x04' & "1"。正则表达式中可以使用 ASCII 编码。. |
\num | 匹配 num,其中 num 是一个正整数。对所获取的匹配的引用。例如,'(.)\1' 匹配两个连续的相同字符。 |
\n | 标识一个八进制转义值或一个向后引用。如果 \n 之前至少 n 个获取的子表达式,则 n 为向后引用。否则,如果 n 为八进制数字 (0-7),则 n 为一个八进制转义值。 |
\nm | 标
识一个八进制转义值或一个向后引用。如果 \nm 之前至少有 nm 个获得子表达式,则 nm 为向后引用。如果 \nm 之前至少有 n
个获取,则 n 为一个后跟文字 m 的向后引用。如果前面的条件都不满足,若 n 和 m 均为八进制数字 (0-7),则 \nm
将匹配八进制转义值 nm。 |
\nml | 如果 n 为八进制数字 (0-3),且 m 和 l 均为八进制数字 (0-7),则匹配八进制转义值 nml。 |
\un | 匹配 n,其中 n 是一个用四个十六进制数字表示的 Unicode 字符。例如, \u00A9 匹配版权符号 (?)。 |
部分例子
正则表达式 | 说明 |
/\b([a-z]+) \1\b/gi | 一个单词连续出现的位置 |
/(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/ | 将一个URL解析为协议、域、端口及相对路径 |
/^(?:Chapter|Section) [1-9][0-9]{0,1}$/ | 定位章节的位置 |
/[-a-z]/ | A至z共26个字母再加一个-号。 |
/ter\b/ | 可匹配chapter,而不能terminal |
/\Bapt/ | 可匹配chapter,而不能aptitude |
/Windows(?=95 |98 |NT )/ | 可匹配Windows95或Windows98或WindowsNT,当找到一个匹配后,从Windows后面开始进行下一次的检索匹配。 |