MyMSDN

MyMSDN记录开发新知道

公共字符串匹配矩阵(max_match)

image

晚上在跟同事聊到字符串匹配的时候,同事提到了矩阵,觉着是个很牛的东西,它的结果如果出现连续的斜线的话,则说明有匹配了,这可以用于做快速搜索,而且也可以搜索最大公共字符串。据说有个很牛的算法专门用来做“最大公共字符串”的,但我还没去找,先将这个矩阵输出试试。

本程序提供两种输出方式,以参数-w和-wo进行区分,分别用于显示人眼识别的信息,以及可供后续程序进行处理的纯净信息。本程序同时提供通用的帮助查看方式,参数为-h或/?等。

通过矩阵,我们就很容易得出最大匹配结果。至于如何分析,暂时还没有做考虑,有空会进行分析。

// max_match.cpp : 定义控制台应用程序的入口点。
//

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BOOL int
#define TRUE 1
#define FALSE 0
typedef char element_t;
#define ele_space ' '
#define ele_break '\n'
#define ele_placeholder '*'
#define ele_end '\0'

element_t *malloc_space(element_t* str1, element_t* str2, BOOL with_illuminate);
element_t *max_match_matrix(element_t* str1, element_t* str2, BOOL with_illuminate, element_t *matrix);
/*display the help informations*/
void man_this(void);
/*
purpose:
    Generate a matrix for max match with str1 & str2.
    e.g:
        str1 = "abcdefghijk";
        str2 =    "define";
    The str2 with the same substring "def" in str1.
    Show the relationship between the two string by matrix.
    You can use the pipe for dealing with the result.
invoke:
    max_match {options} str1 str2
options:
    @-w(default): Display the result infomations for view.
    @    --with_illuminate: The same function as option -w;
    @-wo: Don't display the result infomations for view.
    @    --without_illuminate: The same function as option -wo;
    @-h: Display the help info for users.
    @    --help: The same function as option -h;
    @    /h: The same function as option -h;
    @    /help: The same function as option -h;
    @    /?: The same function as option -h;
    @    -?: The same function as option -h;
*/
int main(int argc, char* argv[])
{
    char *str1, *str2;
    BOOL with_illuminate = TRUE;
    if(argc == 3)    
    {
        str1 = argv[1];
        str2 = argv[2];
    }
    else if(argc == 4)
    {
        if(strcmp(argv[1], "-w") == 0 || strcmp(argv[1], "--with_illuminate") == 0)
            with_illuminate = TRUE;
        else if(strcmp(argv[1], "-wo") == 0 || strcmp(argv[1], "--without_illuminate") == 0)
            with_illuminate = FALSE;
        else
        {
            printf("ERROR: error paramaters in!\n");
            return -2;
        }

        str1 = argv[2];
        str2 = argv[3];
    }
    else if(argc == 2)
    {
        if(strcmp(argv[1], "-h") == 0
            || strcmp(argv[1], "/h") == 0
            || strcmp(argv[1], "--help") == 0
            || strcmp(argv[1], "/help") == 0
            || strcmp(argv[1], "/?") == 0
            || strcmp(argv[1], "-?") == 0)
        {
            man_this();
            return 0;
        }
    }
    else
    {
        printf("ERROR: No enough paramaters!\n");
        return -1;
    }
    
    if(with_illuminate)
    {
        printf("str1:\t|%s\n", str1);
        printf("str2:\t|%s\n", str2);
        printf("\nresult:\n");
    }

    element_t *matrix = malloc_space(str1, str2, with_illuminate);
    printf("%s", max_match_matrix(str1, str2, with_illuminate, matrix));
    free(matrix);
    return 0;
}

element_t *max_match_matrix(element_t* str1, element_t* str2, BOOL with_illuminate, element_t *matrix)
{
    int curr = with_illuminate ? 0 : -1;
    if(with_illuminate)
    {
        matrix[curr] = ele_space;
        int i = -1;
        while(str1[++i] != ele_end)
        {
            matrix[++curr] = str1[i];
        }
        matrix[++curr] = ele_break;
    }
    int j = -1;
    while(str2[++j] != ele_end)
    {
        if(with_illuminate)    matrix[++curr] = str2[j];
        int z = -1;
        while(str1[++z] != ele_end)
        {
            if(str1[z] == str2[j])
                matrix[++curr] = ele_placeholder;
            else
                matrix[++curr] = ele_space;
        }
        matrix[++curr] = ele_break;
    }

    matrix[++curr] = ele_end;
    return matrix;
}

element_t *malloc_space(element_t* str1, element_t* str2, BOOL with_illuminate)
{
    int len1 = strlen(str1);
    int len2 = strlen(str2);
    int len = 0;
    if(with_illuminate)
        len = (len1 + 2) * (len2 + 1) + 1;
    else
        len = (len1 + 1) * len2 + 1;

    element_t* result;
    if((result = (element_t*)malloc((size_t)(len * sizeof(element_t)))) == NULL)
    {
        printf("ERROR: No enough space!");
        return NULL;
    }
    return result;
}

void man_this(void)
{
    printf("%s", "purpose:\n");
    printf("%s", "    Generate a matrix for max match with str1 & str2.\n");
    printf("%s", "    e.g:\n");
    printf("%s", "        str1 = \"abcdefghijk\";\n");
    printf("%s", "        str2 =    \"define\";\n");
    printf("%s", "    The str2 with the same substring \"def\" in str1.\n");
    printf("%s", "    Show the relationship between the two string by matrix.\n");
    printf("%s", "    You can use the pipe for dealing with the result.\n");
    printf("%s", "invoke:\n");
    printf("%s", "    max_match {options} str1 str2\n");
    printf("%s", "options:\n");
    printf("%s", "    @-w(default): Display the result infomations for view.\n");
    printf("%s", "    @    --with_illuminate: The same function as option -w;\n");
    printf("%s", "    @-wo: Don't display the result infomations for view.\n");
    printf("%s", "    @    --without_illuminate: The same function as option -wo;\n");
    printf("%s", "    @-h: Display the help info for users.\n");
    printf("%s", "    @    --help: The same function as option -h;\n");
    printf("%s", "    @    /h: The same function as option -h;\n");
    printf("%s", "    @    /help: The same function as option -h;\n");
    printf("%s", "    @    /?: The same function as option -h;\n");
    printf("%s", "    @    -?: The same function as option -h;\n");
}

posted on 2009-08-07 00:47 volnet 阅读(804) 评论(1)  编辑 收藏 引用

评论

# re: 公共字符串匹配矩阵(max_match) 2009-08-07 01:18 volnet

孤陋寡闻了,原来这种东西满大街都是,呵呵:
http://blog.sina.com.cn/s/blog_455b20c10100929m.html
http://andylin02.javaeye.com/blog/437166
关键字:字符串相似度,最大公共字符串,LCS算法等  回复  更多评论   


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理


特殊功能