字典树原理（转）

Trie树就是字典树，其核心思想就是空间换时间。

举个简单的例子。

给你100000个长度不超过10的单词。对于每一个单词，我们要判断他出没出现过，如果出现了，第一次出现第几个位置。
这题当然可以用hash来，但是我要介绍的是trie树。在某些方面它的用途更大。比如说对于某一个单词，我要询问它的前缀是否出现过。这样hash就不好搞了，而用trie还是很简单。
现在回到例子中，如果我们用最傻的方法，对于每一个单词，我们都要去查找它前面的单词中是否有它。那么这个算法的复杂度就是O(n^2)。显然对于100000的范围难以接受。现在我们换个思路想。假设我要查询的单词是abcd，那么在他前面的单词中，以b，c，d，f之类开头的我显然不必考虑。而只要找以a开头的中是否存在abcd就可以了。同样的，在以a开头中的单词中，我们只要考虑以b作为第二个字母的……这样一个树的模型就渐渐清晰了……
假设有b，abc，abd，bcd，abcd，efg，hii这6个单词，我们构建的树就是这样的。

对于每一个节点，从根遍历到他的过程就是一个单词，如果这个节点被标记为红色，就表示这个单词存在，否则不存在。
那么，对于一个单词，我只要顺着他从跟走到对应的节点，再看这个节点是否被标记为红色就可以知道它是否出现过了。把这个节点标记为红色，就相当于插入了这个单词。
这样一来我们询问和插入可以一起完成，所用时间仅仅为单词长度，在这一个样例，便是10。
我们可以看到，trie树每一层的节点数是26^i级别的。所以为了节省空间。我们用动态链表，或者用数组来模拟动态。空间的花费，不会超过单词数×单词长度。

给出一个用类封装的字典树代码，厄。。。做ACM的模板用另一个。。应该放在了“ACM模板”文件夹下了。。。

#include <cstdio>

#include <iostream>

#include <cstring>

using namespace std;

const int num_chars = 26;

class Trie {

public:

Trie():root(NULL){};

Trie(Trie& tr);

int search(const char* word, char* entry ) const;

int insert(const char* word, const char* entry);

int remove(const char* word, char* entry);

private:

struct Trie_node

{

char* data;

Trie_node* branch[num_chars];

Trie_node();

}* root;

};

Trie::Trie_node::Trie_node()

{

data = NULL;

for (int i=0; i<num_chars; ++i)

branch[i] = NULL;

}

int Trie::search(const char* word, char* entry ) const

{

int position = 0;

char char_code;

Trie_node *location = root;

while( location!=NULL && *word!=0 )

{

if (*word>='A' && *word<='Z')

char_code = *word-'A';

else if (*word>='a' && *word<='z')

char_code = *word-'a';

else return 0;

location = location->branch[char_code];

position++;

word++;

}

if ( location != NULL && location->data != NULL )

{

strcpy(entry,location->data);

return 1;

}

else return 0;

}

int Trie::insert(const char* word, const char* entry)

{

int result = 1, position = 0;

if ( root == NULL ) root = new Trie_node;

char char_code;

Trie_node *location = root;

while( location!=NULL && *word!=0 )

{

if (*word>='A' && *word<='Z')

char_code = *word-'A';

else if (*word>='a' && *word<='z')

char_code = *word-'a';

else return 0;

if( location->branch[char_code] == NULL )

location->branch[char_code] = new Trie_node;

location = location->branch[char_code];

position++;

word++;

}

if (location->data != NULL)

result = 0;

else {

location->data = new char[strlen(entry)+1];

strcpy(location->data, entry);

}

return result;

}

int main()

{

Trie t;

char entry[100];

t.insert("aa", "DET");

t.insert("abacus","NOUN");

t.insert("abalone","NOUN");

t.insert("abandon","VERB");

t.insert("abandoned","ADJ");

t.insert("abashed","ADJ");

t.insert("abate","VERB");

t.insert("this", "PRON");

if (t.search("this", entry))

cout<<"'this' was found. pos: "<<entry<<endl;

if (t.search("abate", entry))

cout<<"'abate' is found. pos: "<<entry<<endl;

if (t.search("baby", entry))

cout<<"'baby' is found. pos: "<<entry<<endl;

else

cout<<"'baby' does not exist at all!"<<endl;

if (t.search("aa", entry))

cout<<"'aa was found. pos: "<<entry<<endl;

}

PS:实现方法http://met.fzu.edu.cn/eduonline/web/web/resources/articleContent.asp?id=346

发表于 2008-11-16 11:42 hunter 阅读(7985) 评论(7) 编辑收藏引用所属分类: algorithm

# re: 字典树原理（转）回复更多评论

您的类里边没有定义析构函数，这样会不会造成内存泄露啊。。。

chasefornone 评论于 2010-08-31 16:14

# re: 字典树原理（转）[未登录] 回复更多评论

不会的

yy 评论于 2010-10-15 19:58

不会才怪菜鸟写的

TonyWang 评论于 2011-09-30 10:28

只有new，没有delete，必然内存泄露

123 评论于 2012-09-03 13:55

代码风格不是很好

ygqwna 评论于 2013-08-31 23:23

一看就是c++外行写的代码，

ddd 评论于 2013-10-25 08:48

常用链接

留言簿(4)

随笔分类

随笔档案

搜索

最新评论

阅读排行榜

评论排行榜