一直找PCRE的学习资料,网上没有发现很全面的,回过头了仔细看了一下PCRE源码dochtml下的资料,发现其实这些文档就是非常不错的学习材料。
今天看了一下如何使用PCRE,还没有涉及到PCRE原理和实现的代码。我们可以在http://www.pcre.org/上下载到pcre的代码,下载到的源文件pcre-x.x.tar.bz2在linux下面很容易就可以被编译和安装(x86 系列cpu哦)。
./configure
make
make install
PCRE编译安装之后,以一个lib库的方式提供给用户程序进行使用,PCRE lib 提供了一组API,通过这一组API可以实现类似于Perl语法的正则表达式查找和匹配的功能。(PCREE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences.)
要想使用好PCRE,要了解很多正则表达式的内容、同时需要对PCRE进行很多的配置,从而使其支持不同的模式和规格。在这里只是简单的描述一下使用PCRE的方法,不涉及配置和正则表达式语法的内容。
使用PCRE主要是使用下面的四个函数,对这四个函数有了了解,使用PCRE库的时候就会简单很多。
pcre_compile() /pcre_compile2()
pcre_study()
pcre_exec()
1. pcre_compile() /pcre_compile2(), 正则表达式在使用之前要经过编译。
pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr);
pcre *pcre_compile2(const char *pattern, int options, int *errorcodeptr, const char **errptr, int *erroffset, const unsigned char *tableptr);
编译的目的是将正则表达式的pattern转换成PCRE引擎能够识别的结构(struct real_pcre)。
还没有对编译的过程进行分析.
2. pcre_study(),对编译后的正则表达式结构(struct real_pcre)进行分析和学习,学习的结果是一个数据结构(struct pcre_extra),这个数据结构连同编译后的规则(struct real_pcre)可以一起送给pcre_exec单元进行匹配.
If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pattern as its first argument. If studying the pattern produces additional information that will help speed up matching, pcre_study() returns a pointer to a pcre_extra block, in which the study_data field points to the results of the study.
pcre_study()的引入主要是为了加速正则表达式匹配的速度.(为什么学习后就能加速呢?)这个还是比较有用的,可以将正则表达式编译,学习后保存到一个文件或内存中,这样进行匹配的时候效率比较搞.snort中就是这样做的.
3. pcre_exec(),根据正则表达式到指定的字符串中进行查找和匹配,并输出匹配的结果.
The function pcre_exec() is called to match a subject string against a compiled pattern, which is passed in the code argument. If the pattern has been studied, the result of the study should be passed in the extra argument. This function is the main matching facility of the library, and it operates in a Perl-like manner.
4. Snort中如何使用PCRE呢?snort中以插件的形式调用PCRE进行正则表达式的匹配。
1)进行正则表达式的初始化。
InitializeDetection--> RegisterRules-->RegisterOneRule-->PCRESetup(Just for OPTION_TYPE_PCRE)->pcre_compile and pcre_study. All will be stored in a structure called PCREInfo in the memory.
2.) 规则的匹配。DetectionCheckRule-->ruleMatch-->ruleMatchInternal-->pcreMatch(OPTION_TYPE_PCRE)->pcre_test-->pcre_exec.
5.编译PCRE on TILERA platform.
1) tar -xjvf pcre-7.9.tar.bz2
2) Modify config.sub to support tile architecture.
We wish to use DE>HOST=tileDE>, but the DE>tileDE> architecture is not yet standard, so may not exist in the DE>config.subDE> file. If necessary, add these lines in the alphabetical list of architectures (typically about 1,100 lines down):
tile*)
basic_machine=tile-tilera
os=-linux-gnu
;;
3) Compile PCRE on tile Linux.
** Start up TILERA card through tile-monitor.
tile-monitor --pci --mount-tile /usr \
--mount-tile /bin --mount-tile /sbin --mount-tile /etc --mount-tile /lib \
--mkdir /mnt/libs --mount /libs-compile /mnt/libs \
--mkdir /mnt/mde --mount $TILERA_ROOT /mnt/mde
* ./configure --build=tile --prefix=/usr lt_cv_sys_max_cmd_len=262144 --disable-cpp
//编译的时候没有使能c++的支持。
pcre-7.9 configuration summary:
pcre-7.9 configuration summary:
Install prefix .................. : /usr
C preprocessor .................. : gcc -E
C compiler ...................... : gcc
C++ preprocessor ................ : g++ -E
C++ compiler .................... : g++
Linker .......................... : /usr/bin/ld
C preprocessor flags ............ :
C compiler flags ................ : -O2
C++ compiler flags .............. : -O2
Linker flags .................... :
Extra libraries ................. :
Build C++ library ............... : no
Enable UTF-8 support ............ : no
Unicode properties .............. : no
Newline char/sequence ........... : lf
\R matches only ANYCRLF ......... : no
EBCDIC coding ................... : no
Rebuild char tables ............. : no
Use stack recursion ............. : yes
POSIX mem threshold ............. : 10
Internal link size .............. : 2
Match limit ..................... : 10000000
Match limit recursion ........... : MATCH_LIMIT
Build shared libs ............... : yes
Build static libs ............... : yes
Link pcregrep with libz ......... : no
Link pcregrep with libbz2 ....... : no
Link pcretest with libreadline .. : no
* make
* make install
4) Compile the PCRE demo code and test PCRE lib on TILERA linux. PCRE 的源文件中提供了两个demo程序,一个是比较简单的pcredemo.c,很容易理解;另外一个是pcretest.c,这个比较全面、完整的介绍了pcre库的使用。这两个demo本身就是非常好的学习材料。
# gcc -o pcredemo pcredemo.c -lpcre
# ./pcredemo 'cat|dog' 'the cat sat on the mat'
Match succeeded at offset 4
0: cat
No named substrings
# ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
Match succeeded at offset 4
0: dog
No named substrings
Match succeeded again at offset 19
0: cat
No named substrings
//参考资料:
PCRE源码文档:pcre-7.9/doc/html