Problem 44: Longest Prefix
Longest Prefix
IOI'96
The structure of some biological objects is represented by the sequence of
their constituents denoted by uppercase letters. Biologists are interested in
decomposing a long sequence into shorter ones called primitives.
We say that a sequence S can be composed from a given set of primitives P if
there is a some sequence of (possibly repeated) primitives from the set whose
concatenation equals S. Not necessarily all primitives need be present. For
instance the sequence ABABACABAABcan be composed from the set of
primitives
{A, AB, BA, CA, BBC}
The first K characters of S are the prefix of S with length K.
Write a program which accepts as input a set of primitives and a sequence of
constituents and then computes the length of the longest prefix that can be
composed from primitives.
PROGRAM NAME: prefix
INPUT FORMAT
First, the input file contains the list (length 1..200) of
primitives (length 1..10) expressed as a series of space-separated strings of
upper-case characters on one or more lines. The list of primitives is terminated
by a line that contains nothing more than a period (`.'). No primitive appears
twice in the list. Then, the input file contains a sequence S (length
1..200,000) expressed as one or more lines, none of which exceed 76 letters in
length. The "newlines" are not part of the string S.
SAMPLE INPUT (file prefix.in)
A AB BA CA BBC
.
ABABACABAABC
OUTPUT FORMAT
A single line containing an integer that is the length of
the longest prefix that can be composed from the set P.
SAMPLE OUTPUT (file prefix.out)
11
题意:给出一对基因序列,在给出一个长串。求出该长串中能与基因序列匹配的最多个数,即最长前缀。代码如下
/*
LANG: C
TASK: prefix
*/
#include<stdio.h>
int T[200010], Len[200];
char s[200][12], str[200010];
void Set(int len)
{
int i;
T[i] = 0;//将第一个初始化为0.目的是为了鉴别-1
for (i = 1; i <= len; i++)
{
T[i] = -1;
}
}
int Find(int len, int num)
{
int i, j, k, l, sum = 0;
for (i = 0; i < len; i++)//遍历整个长串
{
for (j = 0; j < num; j++)
{
k = i - Len[j];//计算将要匹配的开始位置
if (k + 1 >= 0 && T[k+1] != -1)
{
for (l = 0; l < Len[j]; l++)
{
if (str[k+1+l] != s[j][l])
{
break;
}
}
if (l == Len[j])//表示该字串在基因里
{
if (T[k+1] + l > T[i+1])//如果有更长的,则跟新。
{
T[i+1] = T[k+1] + l;
}
}
}
}
if (sum < T[i+1])//更新目标变量
{
sum = T[i+1];
}
}
return sum;
}
int main()
{
freopen("prefix.in", "r", stdin);
freopen("prefix.out", "w", stdout);
int i, num = 0;
while (scanf("%s", s[num]))//读入有多行
{
if (strcmp(s[num], ".") == 0)
{
break;
}
Len[num] = strlen(s[num]);
num++;
}
getchar();
i = 0;
while (scanf("%c", &str[i]) != EOF)//读入有多行
{
if (str[i] != '\n')
{
i++;
}
}
str[i] = 0;
Set(i);
printf("%d\n", Find(i, num));
fclose(stdin);
fclose(stdout);
//system("pause");
return 0;
}
/*
A B C
.
DABCDHETHBNAGRKGJTHNUE
AH AS AZ BW CD CK CN CU CZ DB DH EC ED EN FJ GA GK GM GS GT
HA HN HZ IN IR JB JD JM JZ KG KI LO LQ LU LW LY MJ MT MV ND
NM NS OB OI OK OM PG PO PQ PZ QE QP RG RK RN RP RQ RR RS RU
SA SF SJ SN TK TR TU TY UA UO US UW UX VH VL VO WF WH WL WS
WZ XU XW XY YA YI YN YT ZF ZH ZJ ZL ZR ZX
.
ASCDCKCUEDFJGKGMHAIRJMKILQLWLYMJMTMVNSOIOKOMPOPQQPRGRNRPSJTU
TYWSWZXWYIYTAHAZBWCDCNCUDHECENGAGMGSGTHAHNKGLQLULWNMOBOIOKOM
PGPOPQPZRGRKRNRPRRSATRTUUWVLVOWFWLWSXUXYYAYIYNYTZFZJZRZXBWCK
CUFJGAGSHAHNIRJBJMKILQLUNDNMNSOKOMPQQERGRQRRRSSASFSJSNTUTYUA
UOUWVLWHXUXWXYYAYTZFZJZLZRZXASBWCDCNDBECEDFJGSHAHNHZINJMKGKI
LQMJNMNSOKOMPQQPRRRSSFSJSNTRTUTYUAUWUXVLVOWLWSWZXYYNYTZFZHZR
ZXAHASAZCDCKCNDBECENGMGSHAHZIRJDKGKILOLYOBOKPGPQQPRRSNTRTYUO
USUXVHVLVOWFWLWSXYYNZJZXZZAHAZBWCDCKCNDB
444
*/