字符串匹配算法(2)---String Matching Algorithm

前面已经说到了naive的方法来匹配字符串，但是该方法的特点是复杂度高，未能充分利用计算得到的值作为下一步计算的结果，因此，复杂度相当比较高。
Rabin-Karp提出了新的字符串匹配方法：
Rabin-Karp方法主要是分2部分：
(1)预处理(pre-processing)
对pattern字符串的m个进行计算，得到其hash值。复杂度为O(m)
(2)匹配(matching)
for(int i=0;i<n-m;i++)
{
    计算t[i..i+m]的hash值。
    再同pattern字符串的hash值进行对比，如果相同，则使用naive办法，继续一个字符一个字符地比较。
    使用Horner法则计算下一个m字符的hash值。(即h(s+1)=f(h(s))).
}
整个算法的复杂度：
在最差的情况下，复杂度为O(m)+O(m*(n-m+1))
期望的情况为O(n-m+1+cm)=O(n+m).

实验代码如下：（暂时只是考虑支持12345678901234567890 匹配 234等的情况）

1 #include <iostream>
2 #include <string>
3
4 using namespace std;
5
6 #define Q    997 //Should be a prime
7 #define D    10    //|sizeof character set|
8
9 inline int geth(int m)//h=d^(m-1)%q
10 {
11     int len=1;
12     for(int i=0;i<(m-1);i++)
13     {
14         len*=D;
15     }
16     return len%Q;
17 }
18
19 int main(int argc,char* argv[])
20 {
21     char *t=NULL,*p=NULL;
22     string text,pattern;
23     int h,p0=0,t0=0;
24
25     while(1)
26     {
27         int index = 0;
28
29         cin>>text>>pattern;
30         int n = text.length();
31         int m = pattern.length();
32
33         //t= new char[n+1];
34         //p= new char[m+1];
35         t= new char[n];
36         p= new char[m];
37
38         h = geth(m);
39         p0=t0=0;
40
41         //strcpy(t,text.c_str());
42         //strcpy(p,pattern.c_str());
43         memcpy(t,text.data(),n);
44         memcpy(p,pattern.data(),m);
45
46         for(int i=0;i<m;i++)//get the initial value from pattern and text
47         {
48             p0 =(D*p0+(p[i]-'0'))%Q;
49             t0 = (D*t0+(t[i]-'0'))%Q;
50         }
51
52         for(int i=0;i<(n-m);i++)
53         {
54             if(p0==t0)//should call the naive algorithm to check whether the two string is the same
55             {
56                 bool match = true;
57                 for(int j=0;j<m;j++)
58                 {
59                     if(p[j]!=t[i+j])
60                         match = false;
61                 }
62
63                 if(match)
64                     cout<<i<<endl;
65             }
66
67             t0 = ((D*(t0-(t[i]-'0')*h)+t[i+m])-'0')%Q;
68         }
69
70         delete[] t;
71         delete[] p;
72     }
73
74     system("pause");
75     return 0;
76 }
77

posted on 2007-08-20 21:11 Kingoal Lee's Alogrithm Study using cplusplus 阅读(465) 评论(0) 编辑收藏引用

只有注册用户登录后才能发表评论。
【推荐】100%开源！大型工业跨平台软件C++源码提供，建模，组态！



网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

Problem Solving using C++

字符串匹配算法(2)---String Matching Algorithm

My Links

Blog Stats

常用链接

留言簿(1)

随笔档案

搜索

最新评论

阅读排行榜

评论排行榜