一个文件中有40亿个整数,每个整数为四个字节,内存为1GB,写出一个算法:求出这个文件里的整数里不包含的一个整数
下面的代码片段仅仅是一个样例。
4个字节的整数最大可表示为2^32=4294967296, 一个数一个数的读入内存,建立一个bit map,共需要4294967296个bits(也就是0.5G字节的内存,并没有超过1G内存的限制),读入每一个数,置相应的bit为1。
1 int N = 20; // # of number
2 int M = 1000; // number range
3 std::vector<int> a(N); // can be imported from external file number by number
4 for (int i = 0; i < N; i++)
5 a[i] = (int)rand()%M;
6 std::copy(a.begin(), a.end(), std::ostream_iterator<int>(std::cout, " "));
7 std::cout << "\n";
8 // bit map setup for existence of each number
9 unsigned int nbytes = M%8 ? (M/8+1) : (M/8);
10 std::cout << "nbytes = " << nbytes << "\n";
11
12 char* p = new char [nbytes];
13 memset(p, 0, sizeof(char)*nbytes);
14
15 for (int i = 0; i < N; i++) {
16 unsigned int index = a[i]/8;
17 unsigned int bitpos = a[i]%8;
18 char* tmp = p+index;
19 *tmp |= 1 << bitpos;
20 //std::cout << "bit pos set to 1 : " << 8*index+bitpos << "\n";
21 }
22 for (int i = nbytes-1; i >= 0; i--) {
23 printf("%02X ", (char)*(p+i)&0xFF);
24 }
25 std::cout << "\n";
26 delete [] p;
27