|
|
|
发新文章 |
|
|
除多年编程经验之外,还有什么能区分一个程序员是“老手”还是“新手”?编程技巧当然是一部分,但它绝非是全部。 聪明的程序员可能比他们的同行拥有更出众的编程技巧,但那不足以说明他们就是“老手”。同样,仅仅因为拥有10年编程经验也并不意味着他们就是高手。在工作岗位上,拥有多年编程经验也不能说明问题。即便没被炒鱿鱼,那也不能提升你的价值。 下面列举的事情是大多数高级程序员都会做的。
1.至少掌握一门编程语言 我相信有些优秀的程序员只懂(并精通)一门编程语言,但在某种程度上而言,这其实会限制一个人的思维。就像当你手拿一把锤子时,任何东西看起来都像钉子。我认为,知道并成功使用至少一门编程语言,这是程序员从新手走向老手的重要一步。我要说的是,像JavaScript和SQL这样的辅助编程语言,只有当你确实已经开发了完整的应用程序,并在其中使用这些编程语言时,它们才有价值。 2.工作之余也经常编程 我抱怨过把开源作为招贤的一项要求,但那仅仅因为许多充满激情的程序员把时间花在别的地方。除了对开源有所贡献,你还可以做兼职顾问,兼职创业,开发自己的产品或者创办自己的微型软件公司。当然,你也可以尝试从外部接些兼职项目,可参考伯乐在线的这篇《成功接项目需要注意的几个要点》。 注:mISV即MicroISV,是一个只有一名员工组成的软件公司,是一种微型公司。 3.经历完整的软件开发过程,从概念设计到产品实现,再到产品维护 有的程序员希望不用自己动手就可以得到详细的设计说明,然后把缺陷代码交给测试/维护小组,这是平庸程序员的一个缩影。任何称职的程序员都会跟客户密切合作,去制定需求分析,然后编码实现,当然也要维护。如果你在编码实现阶段偷懒了,那你在维护阶段不得不付出代价。 4.不断创新 创新就是做一些你身边的人没有做过的事情,用来改善你的过程或产品。你不一定非得是世界上第一个做这件事的人,只要发现一个问题,找到解决方法然后实现它就行。 5.编写的软件能解决实际问题 有一副虚构的场景:一名黑客,仅仅是出于对技术以及自己所做事情的爱,一天到晚都在编写代码。但这几乎无助于成就一名优秀的开发者。事实上,我曾见过有些开发人员和客户争论,来采用更好但不太有助客户的技术。这会适得其反。你可以利用自己的时间来完善。但涉及工作时,你最好还是编写能实际改进并解决问题的代码,而不是使用那些不同寻常的算法或接口。
这些问题对于任何想成为高级开发人员的朋友来说,都合情合理。因为这些问题和拥有多少年编程经验并没有关联。如果你能做到上面4-5条,那你就是高级程序员。
http://blog.csdn.net/dark_scope/article/details/9421061 ========================================================================================== 最近一直在看Deep Learning,各类博客、论文看得不少 但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一个toolbox 只是跟着Andrew Ng的UFLDL tutorial 写了些已有框架的代码(这部分的代码见github) 后来发现了一个matlab的Deep Learning的toolbox,发现其代码很简单,感觉比较适合用来学习算法 再一个就是matlab的实现可以省略掉很多数据结构的代码,使算法思路非常清晰 所以我想在解读这个toolbox的代码的同时来巩固自己学到的,同时也为下一步的实践打好基础 (本文只是从代码的角度解读算法,具体的算法理论步骤还是需要去看paper的 我会在文中给出一些相关的paper的名字,本文旨在梳理一下算法过程,不会深究算法原理和公式) ========================================================================================== 使用的代码:DeepLearnToolbox ,下载地址:点击打开,感谢该toolbox的作者 ========================================================================================== 第一章从分析NN(neural network)开始,因为这是整个deep learning的大框架,参见UFLDL ========================================================================================== 首先看一下\tests\test_example_NN.m ,跳过对数据进行normalize的部分,最关键的就是: (为了注释显示有颜色,我把matlab代码中的%都改成了//) - nn = nnsetup([784 100 10]);
- opts.numepochs = 1; // Number of full sweeps through data
- opts.batchsize = 100; // Take a mean gradient step over this many samples
- [nn, L] = nntrain(nn, train_x, train_y, opts);
- [er, bad] = nntest(nn, test_x, test_y);
nn = nnsetup([784 100 10]); opts.numepochs = 1; // Number of full sweeps through data opts.batchsize = 100; // Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); 很简单的几步就训练了一个NN,我们发现其中最重要的几个函数就是nnsetup,nntrain和nntest了 那么我们分别来分析着几个函数,\NN\nnsetup.m nnsetup - function nn = nnsetup(architecture)
- //首先从传入的architecture中获得这个网络的整体结构,nn.n表示这个网络有多少层,可以参照上面的样例调用nnsetup([784 100 10])加以理解
-
- nn.size = architecture;
- nn.n = numel(nn.size);
- //接下来是一大堆的参数,这个我们到具体用的时候再加以说明
- nn.activation_function = 'tanh_opt'; // Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
- nn.learningRate = 2; // learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
- nn.momentum = 0.5; // Momentum
- nn.scaling_learningRate = 1; // Scaling factor for the learning rate (each epoch)
- nn.weightPenaltyL2 = 0; // L2 regularization
- nn.nonSparsityPenalty = 0; // Non sparsity penalty
- nn.sparsityTarget = 0.05; // Sparsity target
- nn.inputZeroMaskedFraction = 0; // Used for Denoising AutoEncoders
- nn.dropoutFraction = 0; // Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
- nn.testing = 0; // Internal variable. nntest sets this to one.
- nn.output = 'sigm'; // output unit 'sigm' (=logistic), 'softmax' and 'linear'
- //对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数
- //vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)
- for i = 2 : nn.n
- // weights and weight momentum
- nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));
- nn.vW{i - 1} = zeros(size(nn.W{i - 1}));
-
- // average activations (for use with sparsity)
- nn.p{i} = zeros(1, nn.size(i));
- end
- end
function nn = nnsetup(architecture) //首先从传入的architecture中获得这个网络的整体结构,nn.n表示这个网络有多少层,可以参照上面的样例调用nnsetup([784 100 10])加以理解 nn.size = architecture; nn.n = numel(nn.size); //接下来是一大堆的参数,这个我们到具体用的时候再加以说明 nn.activation_function = 'tanh_opt'; // Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh). nn.learningRate = 2; // learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs. nn.momentum = 0.5; // Momentum nn.scaling_learningRate = 1; // Scaling factor for the learning rate (each epoch) nn.weightPenaltyL2 = 0; // L2 regularization nn.nonSparsityPenalty = 0; // Non sparsity penalty nn.sparsityTarget = 0.05; // Sparsity target nn.inputZeroMaskedFraction = 0; // Used for Denoising AutoEncoders nn.dropoutFraction = 0; // Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf) nn.testing = 0; // Internal variable. nntest sets this to one. nn.output = 'sigm'; // output unit 'sigm' (=logistic), 'softmax' and 'linear' //对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数 //vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲) for i = 2 : nn.n // weights and weight momentum nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1))); nn.vW{i - 1} = zeros(size(nn.W{i - 1})); // average activations (for use with sparsity) nn.p{i} = zeros(1, nn.size(i)); end end nntrain setup大概就这样一个过程,下面就到了train了,打开\NN\nntrain.m 我们跳过那些检验传入数据是否正确的代码,直接到关键的部分 denoising 的部分请参考论文:Extracting and Composing Robust Features with Denoising Autoencoders - m = size(train_x, 1);
- //m是训练样本的数量
- //注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小
- batchsize = opts.batchsize; numepochs = opts.numepochs;
- numbatches = m / batchsize; //计算batch的数量
- assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');
- L = zeros(numepochs*numbatches,1);
- n = 1;
- //numepochs是循环的次数
- for i = 1 : numepochs
- tic;
- kk = randperm(m);
- //把batches打乱顺序进行训练,randperm(m)生成一个乱序的1到m的数组
- for l = 1 : numbatches
- batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);
- //Add noise to input (for use in denoising autoencoder)
- //加入noise,这是denoising autoencoder需要使用到的部分
- //这部分请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文
- //具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例
- if(nn.inputZeroMaskedFraction ~= 0)
- batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);
- end
- batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);
- //这三个函数
- //nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降
- //我们在下面分析这些函数的代码
- nn = nnff(nn, batch_x, batch_y);
- nn = nnbp(nn);
- nn = nnapplygrads(nn);
- L(n) = nn.L;
- n = n + 1;
- end
-
- t = toc;
- if ishandle(fhandle)
- if opts.validation == 1
- loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
- else
- loss = nneval(nn, loss, train_x, train_y);
- end
- nnupdatefigures(nn, fhandle, loss, opts, i);
- end
-
- disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]);
- nn.learningRate = nn.learningRate * nn.scaling_learningRate;
- end
m = size(train_x, 1); //m是训练样本的数量 //注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小 batchsize = opts.batchsize; numepochs = opts.numepochs; numbatches = m / batchsize; //计算batch的数量 assert(rem(numbatches, 1) == 0, 'numbatches must be a integer'); L = zeros(numepochs*numbatches,1); n = 1; //numepochs是循环的次数 for i = 1 : numepochs tic; kk = randperm(m); //把batches打乱顺序进行训练,randperm(m)生成一个乱序的1到m的数组 for l = 1 : numbatches batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :); //Add noise to input (for use in denoising autoencoder) //加入noise,这是denoising autoencoder需要使用到的部分 //这部分请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文 //具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例 if(nn.inputZeroMaskedFraction ~= 0) batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction); end batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :); //这三个函数 //nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降 //我们在下面分析这些函数的代码 nn = nnff(nn, batch_x, batch_y); nn = nnbp(nn); nn = nnapplygrads(nn); L(n) = nn.L; n = n + 1; end t = toc; if ishandle(fhandle) if opts.validation == 1 loss = nneval(nn, loss, train_x, train_y, val_x, val_y); else loss = nneval(nn, loss, train_x, train_y); end nnupdatefigures(nn, fhandle, loss, opts, i); end disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]); nn.learningRate = nn.learningRate * nn.scaling_learningRate; end 下面分析三个函数nnff,nnbp和nnapplygrads nnff nnff就是进行feedforward pass,其实非常简单,就是整个网络正向跑一次就可以了 当然其中有dropout和sparsity的计算 具体的参见论文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity - function nn = nnff(nn, x, y)
- //NNFF performs a feedforward pass
- // nn = nnff(nn, x, y) returns an neural network structure with updated
- // layer activations, error and loss (nn.a, nn.e and nn.L)
-
- n = nn.n;
- m = size(x, 1);
-
- x = [ones(m,1) x];
- nn.a{1} = x;
-
- //feedforward pass
- for i = 2 : n-1
- //根据选择的激活函数不同进行正向传播计算
- //你可以回过头去看nnsetup里面的第一个参数activation_function
- //sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变
- //tanh_opt是1.7159*tanh(2/3.*A)
- switch nn.activation_function
- case 'sigm'
- // Calculate the unit's outputs (including the bias term)
- nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');
- case 'tanh_opt'
- nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');
- end
-
- //dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数
- if(nn.dropoutFraction > 0)
- if(nn.testing)
- nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
- else
- nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
- nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
- end
- end
- //计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数
- //calculate running exponential activations for use with sparsity
- if(nn.nonSparsityPenalty>0)
- nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);
- end
-
- //Add the bias term
- nn.a{i} = [ones(m,1) nn.a{i}];
- end
- switch nn.output
- case 'sigm'
- nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');
- case 'linear'
- nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
- case 'softmax'
- nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
- nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));
- nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));
- end
- //error and loss
- //计算error
- nn.e = y - nn.a{n};
-
- switch nn.output
- case {'sigm', 'linear'}
- nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;
- case 'softmax'
- nn.L = -sum(sum(y .* log(nn.a{n}))) / m;
- end
- end
function nn = nnff(nn, x, y) //NNFF performs a feedforward pass // nn = nnff(nn, x, y) returns an neural network structure with updated // layer activations, error and loss (nn.a, nn.e and nn.L) n = nn.n; m = size(x, 1); x = [ones(m,1) x]; nn.a{1} = x; //feedforward pass for i = 2 : n-1 //根据选择的激活函数不同进行正向传播计算 //你可以回过头去看nnsetup里面的第一个参数activation_function //sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变 //tanh_opt是1.7159*tanh(2/3.*A) switch nn.activation_function case 'sigm' // Calculate the unit's outputs (including the bias term) nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}'); case 'tanh_opt' nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}'); end //dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数 if(nn.dropoutFraction > 0) if(nn.testing) nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction); else nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction); nn.a{i} = nn.a{i}.*nn.dropOutMask{i}; end end //计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数 //calculate running exponential activations for use with sparsity if(nn.nonSparsityPenalty>0) nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1); end //Add the bias term nn.a{i} = [ones(m,1) nn.a{i}]; end switch nn.output case 'sigm' nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}'); case 'linear' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; case 'softmax' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2))); nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2)); end //error and loss //计算error nn.e = y - nn.a{n}; switch nn.output case {'sigm', 'linear'} nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m; case 'softmax' nn.L = -sum(sum(y .* log(nn.a{n}))) / m; end end nnbp 代码:\NN\nnbp.m nnbp呢是进行back propagation的过程,过程还是比较中规中矩,和ufldl中的Neural Network讲的基本一致 值得注意的还是dropout和sparsity的部分 - if(nn.nonSparsityPenalty>0)
- pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);
- sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];
- end
-
- // Backpropagate first derivatives
- if i+1==n % in this case in d{n} there is not the bias term to be removed
- d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56)
- else // in this case in d{i} the bias term has to be removed
- d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;
- end
-
- if(nn.dropoutFraction>0)
- d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];
- end
if(nn.nonSparsityPenalty>0) pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1); sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))]; end // Backpropagate first derivatives if i+1==n % in this case in d{n} there is not the bias term to be removed d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56) else // in this case in d{i} the bias term has to be removed d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act; end if(nn.dropoutFraction>0) d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}]; end 这只是实现的内容,代码中的d{i}就是这一层的delta值,在ufldl中有讲的 dW{i}基本就是计算的gradient了,只是后面还要加入一些东西,进行一些修改 具体原理参见论文“Improving Neural Networks with Dropout“ 以及 Autoencoders and Sparsity的内容 nnapplygrads 代码文件:\NN\nnapplygrads.m - for i = 1 : (nn.n - 1)
- if(nn.weightPenaltyL2>0)
- dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};
- else
- dW = nn.dW{i};
- end
-
- dW = nn.learningRate * dW;
-
- if(nn.momentum>0)
- nn.vW{i} = nn.momentum*nn.vW{i} + dW;
- dW = nn.vW{i};
- end
-
- nn.W{i} = nn.W{i} - dW;
- end
for i = 1 : (nn.n - 1) if(nn.weightPenaltyL2>0) dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i}; else dW = nn.dW{i}; end dW = nn.learningRate * dW; if(nn.momentum>0) nn.vW{i} = nn.momentum*nn.vW{i} + dW; dW = nn.vW{i}; end nn.W{i} = nn.W{i} - dW; end 这个内容就简单了,nn.weightPenaltyL2 是weight decay的部分,也是nnsetup时可以设置的一个参数 有的话就加入weight Penalty,防止过拟合,然后再根据momentum的大小调整一下,最后改变nn.W{i}即可 nntest nntest再简单不过了,就是调用一下nnpredict,在和test的集合进行比较 - function [er, bad] = nntest(nn, x, y)
- labels = nnpredict(nn, x);
- [~, expected] = max(y,[],2);
- bad = find(labels ~= expected);
- er = numel(bad) / size(x, 1);
- end
function [er, bad] = nntest(nn, x, y) labels = nnpredict(nn, x); [~, expected] = max(y,[],2); bad = find(labels ~= expected); er = numel(bad) / size(x, 1); end nnpredict 代码文件:\NN\nnpredict.m - function labels = nnpredict(nn, x)
- nn.testing = 1;
- nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));
- nn.testing = 0;
-
- [~, i] = max(nn.a{end},[],2);
- labels = i;
- end
function labels = nnpredict(nn, x) nn.testing = 1; nn = nnff(nn, x, zeros(size(x,1), nn.size(end))); nn.testing = 0; [~, i] = max(nn.a{end},[],2); labels = i; end 继续非常简单,predict不过是nnff一次,得到最后的output~~ max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
(这个test好像是专门用来test 分类问题的,我们知道nnff得到最后的值即可)
总结 总的来说,神经网络的代码比较常规易理解,基本上和 UFLDL中的内容相差不大 只是加入了dropout的部分和denoising的部分 本文的目的也不奢望讲清楚这些东西,只是给出一个路线,可以跟着代码去学习,加深对算法的理解和应用能力
每次开机要按F1,曾经网上搜索过没找到原因。因为CPU风扇响,今天20131122戴尔客服上门服务,讲是因为我没有使用独立显卡的问题。如果显示器加上白色的转接头,使用独立显卡就没有问题。原来一直使用的是集成显卡。这样要耗内存。 按F8进入不了安全模式,20131202电话戴尔客服,只解决硬件问题,软件问题不负责上门解决。
E.g.
xlabel('$n$','interpreter','latex');
ylabel('${\gamma}$','interpreter','latex');
title('${\gamma}(n)$','interpreter','latex');
text(0.5,0.5, '$\int_{a}^{b}f(x)dx$', 'interpreter','latex'); 例1:我的程序Flicker.m ( 所有字体都是Latex)
h = legend('ours','$l_p$-norm MKL',1);
set(h,'Interpreter','latex'); 例2:我的程序AR_Sparse.m ( 仅仅公式字体是latex)
h = legend('l1','DLSR-FS',4);
h1 = findobj(get(h,'Children'),'String','l1');
set(h1,'String','$l_1$','Interpreter','LaTex'); ---------------------------如何在matlab中的xlabel,ylabel,legend和text函数中使用latex( 以下关于legend的不必看,已到我的例二)--------------------------- http://blog.sina.com.cn/s/blog_6e0693f70100nj22.html 以下例子中展示了如何用在matlab函数中使用latex t = 0:0.1:2*pi;
x = sin(t);
hold on;
plot(t,x,'-*');
plot(t,2*x,'-.');
%% Using Latex in xlabel and ylabel
xlabel('$Time$','Interpreter','LaTex');
ylabel('$Value$','Interpreter','LaTex');
%% Using Latex in legend
h = legend('sin(x)__','2*sin(x)__');
h1 = findobj(get(h,'Children'),'String','sin(x)__');
set(h1,'String','$sin(\hat{x})$','Interpreter','LaTex');
h2 = findobj(get(h,'Children'),'String','2*sin(x)__');
set(h2,'String','$2*sin(\hat{x})$','Interpreter','LaTex');
%% Using Latex in text
text('Interpreter','latex',...
'String','$$\int_0^x\!\int_y dF(u,v)$$',...
'Position',[3 1],...
'FontSize',16);
%% Using Latex in title
title('$How \ to \ use \ latex \ in \ figure$','Interpreter','LaTex');
元胞数组cell array
Element:cell;以下标index访问cell,以元胞内编址content addressing访问元胞内容;cell中可以存放任何类型,任何大小的数组
cell的创建
cell()%创建元胞数组
c=cell(2);%创建2×2的
c=cell(m,n);%创建m×n的,用cell函数创建元胞数组,创建的数组为空元胞。cell函数创建空元胞数组的主要目的是为数组预先分配连续的存储空间,节约内存占用,提高执行效率。同cppblog/MATLAB程序的优化、预先分配存储空间
c_str=char('This is cell array');c_R=rand(3,3);c_comp=1+2i;c_sym=sym('sin(-3*t)*exp(-t)');
两种赋值方式:index赋值和content addressing赋值
index赋值:A(1,1)={c_str};A(1,2)={c_R};A(2,1)={c_comp};A(2,2)={c_sym};
content addressing赋值:B{1,1}=c_str;B{1,2}=c_R;B{2,1}=c_comp;B{2,2}=c_sym;% The class({c_str}) and class(c_str) are cell and string, respectively.
cell array的访问 B{1,2} B{1,2}(1,2)
问题:我有一个cell变量。比方说3*1的size 每一个里面存了一个1✘9的矩阵。matlab中有没有快速的语句把每一个1*9的矩阵,reshape成3*3的矩阵。也就是说对一个cell变量中的每一个元素(应该是一个矩阵)进行reshape操作。当然不想用循环做。 答案:A = {rand(1,9),rand(1,9),rand(1,9)}; cellfun(@(x) reshape(x, 3, 3).', A, 'UniformOutput', false)
cell转化成矩阵的函数cell2mat 例:A = {rand(1,9),rand(1,9),rand(1,9)};cell2mat(A)
结构体数组 structure array
Element:structure
域访问
域中可以存放任何类型、任何大小的数组
类C
cell和struct的转换cell2struct. Matlab提供了两种定义结构的方式:直接应用和使用struct函数。1. 使用直接引用方式定义结构与建立数值型数组一样,建立新struct对象不需要事先申明,可以直接引用,而且可以动态扩充。比如建立一个复数变量x: x.real = 0; % 创建字段名为real,并为该字段赋值为0 x.imag = 0 % 为x创建一个新的字段imag,并为该字段赋值为0 x = real: 0 imag: 0 然后可以将旗动态扩充为数组: x(2).real = 0; % 将x扩充为1×2的结构数组 x(2).imag = 0; 在任何需要的时候,也可以为数组动态扩充字段,如增加字段scale: x(1).scale = 0; 这样,所有x都增加了一个scale字段,而x(1)之外的其他变量的scale字段为空: x(1) % 查看结构数组的第一个元素的各个字段的内容 ans = real: 0 imag: 0 scale: 0 2. 使用struct函数创建结构使用struct函数也可以创建结构,该函数产生或吧其他形式的数据转换为结构数组。 struct的使用格式为: s = sturct('field1',values1,'field2',values2,…); 该函数将生成一个具有指定字段名和相应数据的结构数组,其包含的数据values1、valuese2等必须为具有相同维数的数据,数据的存放位置域其他结构位置一一对应的。对于struct的赋值用到了元胞数组。数组values1、values2等可以是元胞数组、标量元胞单元或者单个数值。每个values的数据被赋值给相应的field字段。 当valuesx为元胞数组的时候,生成的结构数组的维数与元胞数组的维数相同。而在数据中不包含元胞的时候,得到的结构数组的维数是1×1的。例如: s = struct('type',{'big','little'},'color',{'blue','red'},'x',{3,4}) s = 1x2 struct array with fields: type color x 参考文献:http://blog.sciencenet.cn/blog-436588-320694.html
结合我的教材P31
matlab文件操作:见电脑目录Blessing of Dimensionality\Features\code\readFea.m 和 findImsFeaIdx.m http://blog.sina.com.cn/s/blog_4cfb5a6201015i8q.html 例1: 路径+文件名:d:\moon.txt 内容:
13,1,3.4 3,2.1,23 1,12,2 4,5.4,6 现在为了读取moon中的数据存在一个数组里,可以用如下方法fid=fopen('d:\moon.txt');data=fscanf(fid,'%f,%f,%f',[3,inf]) ;%这里得用单引号fclose(fid);这时data中的数据如下:13 3 1 4 1 2.1 12 5.4 3.4 23 2 6 例2: 数据在d:\test.txt0.00 good 2 0.10 bot 3 1.02 yes 4 1.00 yes 5 1.00 yes 6 1.00 yes 3 1.00 yes 5 1.00 yes 6 1.00 yes 1 1.00 yes 3 1.00 yes 7 1.00 yes 3 1.00 yes 2
程序: fid = fopen('d:\test.txt', 'r'); a = fscanf(fid, '%f %*s %d ', [2 inf]) % It has two rows now. fclose(fid); a 解释下:第一列和第二列之间有四个空格,format也要四空格哦!有三列即三种类型,要有三种format,%*s即为不输出字符串型。
fid = fopen('E:\temp\test.txt', 'r'); a = fscanf(fid, '%f %*s %*f ', 5) % It has five rows and one column now. %*s %*f 即这两个不输出 fclose(fid); a 程序结果为: a = 0 0.1000 1.0200 1.0000 1.0000
摘要: http://www.cs.toronto.edu/~kyros/local_outgoing/ICCV-Final-Results/ICCV 2013 Results (29 Aug, 2013)Papers submitted: 1629Withdrawals and administrative rejections: 128Accepted as Orals: 41 (2.52% oral... 阅读全文
在电脑文件夹E:\other\matlab 2007a\work\SVM\libsvm-mat-3.0-1 ,这个是已经编译好的,到64位机上要重新编译(不要利用别人传的,因为可能改过SVM程序,例如Libing wang他改过其中程序,最原始版本:E:\other\matlab 2007a\work\SVM\libsvm-mat-3.0-1.zip,从http://www.csie.ntu.edu.tw/~cjlin/libsvm/matlab/libsvm-mat-3.0-1.zip下载)。svmtrain在matlab自带的工具箱中也有这个函数, libing 讲libsvm-mat-3.0-1放到C:\Program Files\MATLAB\R2010a\toolbox\目录,再adddpath和savepath即可。如果产生以下问题:每次都要 adddpath和savepath ,在matlab重新启动后要重新 adddpath和savepath。解决方案:可以在要运行的程序前面添加如下语句即可: addpath('C:\Program Files\MATLAB\R2010a\toolbox\libsvm-mat-3.0-1');
README文件写得很好,其中的Examples完全理解(包括Precomputed Kernels.Constructing a linear kernel matrix and then using the precomputed kernel gives exactly the same testing error as using the LIBSVM built-in linear kernel.核就是相似度,自己想定义什么相似度都可以)
(1) model = svmtrain(training_label_vector, training_instance_matrix [, 'libsvm_options']);
libsvm_options的设置:
Examples of options: -s 0 -c 10 -t 1 -g 1 -r 1 -d 3 Classify a binary data with polynomial kernel (u'v+1)^3 and C = 10
options:
-s svm_type : set type of SVM (default 0)
0 -- C-SVC
1 -- nu-SVC
2 -- one-class SVM
3 -- epsilon-SVR
4 -- nu-SVR C-SVC全称是什么? C-SVC(C-support vector classification),nu-SVC(nu-support vector classification),one-class SVM(distribution estimation),epsilon-SVR(epsilon-support vector regression),nu-SVR(nu-support vector regression)
-t kernel_type : set type of kernel function (default 2)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1) -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0) -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
The k in the -g option means the number of attributes in the input data.
(2)如何采用线性核?
matlab> % Linear Kernel
matlab> model_linear = svmtrain(train_label, train_data, '-t 0');
严格讲,线性核也要像高斯核一样调整c这个参数,Libing wang讲一般C=1效果比较好,可能调整效果差异不大,当然要看具体的数据集。c大,从SVM目标函数可以看出,c越大,相当于惩罚松弛变量,希望松弛变量接近0,即都趋向于对训练集全分对的情况,这样对训练集测试时准确率很高,但推广能力未必好,即在测试集上未必好。c小点,相当于边界的有些点容许分错,将他们当成噪声点,这样外推能力比较好。
(3)如何采用高斯核?
matlab> load heart_scale.mat
matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07');
高斯的SVM比线性SVM效果要差,为什么? 20150420 libing讨论,可能的解释:样本少,不适合高斯核。范围有限,也许更广泛的参数范围会有更好的效果
(4)如何实现交叉验证?
README文件有如下一句话:If the '-v' option is specified, cross validation is
conducted and the returned model is just a scalar: cross-validation
accuracy for classification and mean-squared error for regression.
(5) 如何调整高斯核的两个参数?
思路1:在训练集上调整两个参数使在训练集上测试错误率最低,就选这样的参数来测试测试集
思路1的问题:Libing Wang讲这样很容易过学习,因为在训练集上很容易达到100%准确率,但在测试集上未必好,即过学习。用思路2有交叉验证,推广性能比较好(交叉验证将训练集随机打乱,推广性能很好)
思路2:% E:\other\matlab 2007a\work\DCT\DCT_original\network.m
思路2的问题:针对不同的数据集,这两个参数分别在什么范围内调整,有没有什么经验? 方式1:就是network.m中gamma的取值 方式2:http://www.cppblog.com/guijie/archive/2010/12/02/135243.html. 其他答案:除了在训练集上做交叉验证,还有另外一种思路:类似A Regularized Approach to Feature Selection for Face Detection (ACCV 2007)的4.2节:训练集、验证集和测试集,Libing讲该文4.2节调参数除了分成训练集、验证集和测试集,没有其他什么的。Libing讲在训练集上交叉验证也相当于训练集挑了一部分做验证,原理一样。
(6)如何采用预定义核?
To use precomputed kernel, you must include sample serial number asthe first column of the training and testing data (assume your kernel matrix is K, # of instances is n):
matlab> K1 = [(1:n)', K]; % include sample serial number as first column
matlab> model = svmtrain(label_vector, K1, '-t 4');
matlab> [predict_label, accuracy, dec_values] = svmpredict(label_vector, K1, model); % test the training data
We give the following detailed example by splitting heart_scale into 150 training and 120 testing data. Constructing a linear kernel matrix and then using the precomputed kernel gives exactly the same testing error as using the LIBSVM built-in linear kernel.
matlab> load heart_scale.mat
matlab>
matlab> % Split Data
matlab> train_data = heart_scale_inst(1:150,:);
matlab> train_label = heart_scale_label(1:150,:);
matlab> test_data = heart_scale_inst(151:270,:);
matlab> test_label = heart_scale_label(151:270,:);
matlab>
matlab> % Linear Kernel
matlab> model_linear = svmtrain(train_label, train_data, '-t 0');
matlab> [predict_label_L, accuracy_L, dec_values_L] = svmpredict(test_label, test_data, model_linear);
matlab>
matlab> % Precomputed Kernel
matlab> model_precomputed = svmtrain(train_label, [(1:150)', train_data*train_data'], '-t 4');
matlab> [predict_label_P, accuracy_P, dec_values_P] = svmpredict(test_label, [(1:120)', test_data*train_data'], model_precomputed);
matlab>
matlab> accuracy_L % Display the accuracy using linear kernel
matlab> accuracy_P % Display the accuracy using precomputed kernel (7)如何实现概率估计?
For probability estimates, you need '-b 1' for training and testing:
matlab> load heart_scale.mat
matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07 -b 1');
matlab> load heart_scale.mat
matlab> [predict_label, accuracy, prob_estimates] = svmpredict(heart_scale_label, heart_scale_inst, model, '-b 1'); 非概率估计
matlab> load heart_scale.mat
matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07');
matlab> [predict_label, accuracy, dec_values] = svmpredict(heart_scale_label, heart_scale_inst, model); % test the training data (8) svmpredict的用法(摘自libsvm-mat-2.9-1的README)
[predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model [, 'libsvm_options']); 输入:testing_label_vector, If labels of test data are unknown, simply use any random values. (type must be double)。模型一旦确定,预测的标记就确定了,如果不利用第二个输出accuracy,则testing_label_vector随便设置,当然如果要利用accuracy,就要将testing_label_vector设置成测试标记了。(Action recognition\ASLAN database中的代码CLSlibsvmC,第九行用到svmpredict,testing_label_vector设置成ones(size(Samples,2),1),是无所谓的)。
svmpredict输出的含义:
predictd_label, is a vector of predicted labels(故CLSlibsvmC的12到14行没用);
胥志伟I-Rising(285308540) 2015/5/31 21:56:30 各位老师,同学。请问有人研究过svm 中predict的decision value吗?麻烦帮忙解释下怎么计算的。谢谢。 苏松志-T-厦大(14291414) 2015/5/31 22:04:40 指的是libsvm?优化完之后得到每个支持向量的alph_i,然后,计算输入的x和各个支持向量的核函数距离ki,sum(ki*alph_i*yi) for all SUPPORT VECTORS,+b,就是按照教科书上的公式 说明:以上understand completely,对应杨光正教材P28公式(2.54) 胥志伟I-Rising(285308540) 2015/5/31 22:08:30 多类的问题也是这样吗? 苏松志-T-厦大(14291414) 2015/5/31 22:13:41 是的,多类采用的是1vs1,所有里面的sv_coef是一个mxn的矩阵,m:支持向量的数目,n: 类别数目-1,然后有一个rho向量,套用公式,计算 摘自libsvm-mat-3.0-1的README
The function 'svmpredict' has three outputs. The first one, predictd_label, is a vector of predicted labels. The second output, accuracy, is a vector including accuracy (for classification), mean squared error, and squared correlation coefficient (for regression). The third is a matrix containing decision values or probability estimates (if '-b 1' is specified). If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. Note that the order of classes here is the same as 'Label' field in the model structure. (9)LibSVM是如何采用one-versus-rest和one-verse-one实现多类分类的?one-versus-rest和one-verse-one的定义见模式识别笔记第四页反面(同时见孙即祥教材P47)。找libing wang和junge zhang,他们都讲没对这个深究过。根据“If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. ”,我觉得应该是 probabilities实现的是one-versus-rest,即采用-b 1这个选项,他俩都觉得我理解应该是正确的。junge讲参加pascal竞赛和imagenet,他们都是训练k个SVM(即one-versus-rest,没用one-versus-one,后者太慢,而且估计效果差不多),没有直接采用SVM做多类问题。 20130910 LibSVM作者回信: Libsvm implements only 1vs1.For 1vsrest, you can check the followinglibsvm faqQ: LIBSVM supports 1-vs-1 multi-class classification. If instead I wouldlike to use 1-vs-rest, how to implement it using MATLAB interface? 网址:http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f808 Q: LIBSVM supports 1-vs-1 multi-class classification. If instead I would like to use 1-vs-rest, how to implement it using MATLAB interface?
Please use code in the following directory. The following example shows how to train and test the problem dna (training and testing).
Load, train and predict data: [trainY trainX] = libsvmread('./dna.scale'); [testY testX] = libsvmread('./dna.scale.t'); model = ovrtrain(trainY, trainX, '-c 8 -g 4'); [pred ac decv] = ovrpredict(testY, testX, model); fprintf('Accuracy = %g%%\n', ac * 100); Conduct CV on a grid of parametersbestcv = 0; for log2c = -1:2:3, for log2g = -4:2:1, cmd = ['-q -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)]; cv = get_cv_ac(trainY, trainX, cmd, 3); if (cv >= bestcv), bestcv = cv; bestc = 2^log2c; bestg = 2^log2g; end fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv); end end (9)如何实现验证模式下的准确率?见我写的程序RVM\code\Yale\SVM\TestYale_SVM_2classes --------------------------------------------------------------------------------------------------------------------------------------------------------http://blog.sina.com.cn/s/blog_64b046c701018c8n.html
MATLAB自带的svm实现函数与libsvm差别小议 1 MATLAB自带的svm实现函数仅有的模型是C-SVC(C-support vector classification); 而libsvm工具箱有C-SVC(C-support vector classification),nu-SVC(nu-support vector classification),one-class SVM(distribution estimation),epsilon-SVR(epsilon-support vector regression),nu-SVR(nu-support vector regression)等多种模型可供使用。 2 MATLAB自带的svm实现函数仅支持分类问题,不支持回归问题;而libsvm不仅支持分类问题,亦支持回归问题。 3 MATLAB自带的svm实现函数仅支持二分类问题,多分类问题需按照多分类的相应算法编程实现;而libsvm采用1v1算法支持多分类。 4 MATLAB自带的svm实现函数采用RBF核函数时无法调节核函数的参数gamma,貌似仅能用默认的;而libsvm可以进行该参数的调节。 5 libsvm中的二次规划问题的解决算法是SMO;而MATLAB自带的svm实现函数中二次规划问题的解法有三种可以选择:经典二次方法;SMO;最小二乘。(这个是我目前发现的MATLAB自带的svm实现函数唯一的优点~) --------------------------------------------------------------------------------------------------------------------------------------------------------
SVM 理论部分 SVM下面推导核化形式(Eric Xing教材)+M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold Regularization: AGeometric Framework for Learning from Labeled and Unlabeled Examples,” J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006的4.3和4.4节.+Ensemble Manifold Regularization (TPAMI 2012)
电脑里的"ZhuMLSS14.pdf "是很好的入门材料
http://blog.csdn.net/xiao_xia_/article/details/6906465
t-检验: t-检验,又称student‘s t-test,可以用于比较两组数据是否来自同一分布(可以用于比较两组数据的区分度),假设了数据的正态性,并反应两组数据的方差在统计上是否有显著差异。 matlab中提供了两种相同形式的方法来解决这一假设检验问题,分别为ttest方法和ttest2方法,两者的参数、返回值类型均相同,不同之处在于ttest方法做的是 One-sample and paired-sample t-test,而ttest2则是 Two-sample t-test with pooled or unpooled variance estimate, performs an unpaired two-sample t-test。但是这里至于paired和unpaired之间的区别我却还没搞清楚,只是在Student's t-test中看到了如下这样一段解释: “Two-sample t-tests for a difference in mean involve independent samples, paired samples and overlapping samples. Pairedt-tests are a form of blocking, and have greater powerthan unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared.[8] In a different context, paired t-tests can be used to reduce the effects ofconfounding factors in an observational study.”
因此粗略认为paired是考虑了噪声因素的。 在同样的两组数据上分别用ttest和ttest2方法得出的结果进行比较,发现ttest返回的参数p普遍更小,且置信区间ci也更小。
最常用用法: [H,P,CI]=ttest2(x,y);(用法上ttest和ttest2相同,完整形式为[H,P,CI, STATS]=ttest2(x,y, ALPHA);) 其中,x,y均为行向量(维度必须相同),各表示一组数据,ALPHA为可选参数,表示设置一个值作为t检验执行的显著性水平(performs the test at the significance level (100*ALPHA)%),在不设置ALPHA的情况下默认ALPHA为0.05,即计算x和y在5%的显著性水平下是否来自同一分布(假设是否被接受) 结果:H=0,则表明零假设在5%的置信度下不被拒绝(根据当设置x=y时候,返回的H=0推断而来),即x,y在统计上可看做来自同一分布的数据;H=1,表明零假设被拒绝,即x,y在统计上认为是来自不同分布的数据,即有区分度。 P为一个概率,matlab help中的解释是“ the p-value, i.e., the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. Small values of P cast doubt on the validity of the null hypothesis.” 暂且认为表示判断值在真实分布中被观察到的概率(?不太懂) CI为置信区间(confidence interval),表示“a 100*(1-ALPHA)% confidence interval for the true difference of population means”,即达到100*(1-ALPHA)%的置信度的数据区间(?)
应用:常与k-fold crossValidation(交叉验证)联用可以用于两种算法效果的比较,例如A1,A2两算法得出的结果分别为x,y,且从均值上看mean(x)>mean(y),则对[h,p,ci]=ttest2(x,y);当h=1时,表明可以从统计上断定算法A1的结果大于(?)A2的结果(即两组数据均值的比较是有意义的),h=0则表示不能根据平均值来断定两组数据的大小关系(因为区分度小)
临时学的,没经过太多测试,不一定对,还请高手指教。
另外还有在某个ppt(http://jura.wi.mit.edu/bio/education/hot_topics/pdf/matlab.pdf)中看到这样一页 
参考资料:
经验+自身理解
matlab 7.11.0(R2010b)的帮助文档 wikipedia http://www.biosino.org/pages/newhtm/r/schtml/One_002d-and-two_002dsample-tests.html
http://blog.csdn.net/abcjennifer/article/details/8572994
很久没有写学习笔记了,年初先后忙考试,忙课程,改作业,回家刚安定下来,读了大神上学期给的paper,这几天折腾数学的感觉很好,就在这里和大家一起分享一下,希望能够有所收获。响应了Jeffrey的建议,强制自己把笔记做成英文的,可能给大家带来阅读上的不便,希望大家理解,多读英文的东西总没坏处的。这里感谢大神和我一起在本文手稿部分推了一些大牛的“ easily achieved”stuff... 本文尚不成熟,我也是初接触Robust PCA,希望各位能够拍砖提出宝贵意见。
Robust PCA Rachel Zhang 1. RPCA Brief Introduction 1. Why use Robust PCA? Solve the problem withspike noise with high magnitude instead of Gaussian distributed noise.
2. Main Problem Given C = A*+B*, where A*is a sparse spike noise matrix and B* is a Low-rank matrix, aiming at recoveringB*. B*= UΣV’, in which U∈Rn*k ,Σ∈Rk*k ,V∈Rn*k
3. Difference from PCA Both PCA and Robust PCAaims at Matrix decomposition, However, In PCA, M = L0+N0, L0:low rank matrix ; N0: small idd Gaussian noise matrix,it seeks the best rank-k estimation of L0 by minimizing ||M-L||2 subjectto rank(L)<=k. This problem can be solved by SVD. In RPCA, M = L0+S0, L0:low rank matrix ; S0: a sparse spikes noise matrix, we’lltry to give the solution in the following sections.
2. Conditionsfor correct decomposition 4. Ill-posed problem: Suppose sparse matrix A* and B*=eiejTare the solution of this decomposition problem. 1) With the assumption that B* is not only low rank but alsosparse, another valid sparse-plus low-rank decomposition might be A1= A*+ eiejT andB1 = 0, Therefore, we need an appropriatenotion of low-rank that ensures that B* is not too sparse. Conditionswill be imposed later that require the space spanned by singular vectors U andV (i.e., the row and column spaces of B*) to be “incoherent” with the standardbasis. 2) Similarly, with the assumption that A* is sparse as well aslow-rank (e.g. the first column of A* is non-zero, while all other columns are0, then A* has rank 1 and sparse). Another valid decomposition might be A2=0,B2 = A*+B* (Here rank(B2) <= rank(B*) + 1). Thus weneed the limitation that sparse matrix should beassumed not to be low rank. I.e., assume each row/column does not havetoo many non-zero entries (don’t exist dense row/column), to avoid such issues.
5. Conditions for exact recovery / decomposition: If A* and B* are drawnfrom these classes, then we have exact recovery with high probability [1]. 1) For low rank matrix L---Random orthogonal model [Candes andRecht 2008]: A rank-k matrix B* with SVD B*=UΣV’ is constructed in this way: Thesingular vectors U,V∈Rn*k are drawnuniformly at random from the collection of rank-k partial isometries(等距算子) inRn*k. The singular vectors in U and V need not be mutuallyindependent. No restriction is placed on the singular values. 2) For sparse matrix S---Random sparsity model: The matrix A* such that support(A*) is chosen uniformlyat random from the collection of all support sets of size m. There is noassumption made about the values of A* at locations specified by support(A*). [Support(M)]: thelocations of the non-zero entries in M Latest [2] improved on the conditions andyields the ‘best’ condition.
3. Recovery Algorithms 6. Formulization For decomposition D = A+E,in which A is low rank and error E is sparse. 1) Intuitively propose minrank(A)+γ||E||0, (1) However, it is non-convex thus intractable (both of these 2are NP-hard to approximate). 2) Relax L0-norm to L1-norm and replace rank with nuclear norm min||A||* + λ||E||1, where||A||* =Σiσi(A) (2) This is convex, i.e., exist a uniquely minimizer. Reason: This relaxation ismotivated by observing that ||A||* + λ||E||1 is theconvex envelop (凸包) of rank(A)+γ||E||0 over the set of (A,E) suchthat max(||A||2,2,||E||1, ∞)≤1. Moreover, there might be circumstancesunder which (2) perfectly recovers low-rank matrix A0.[3] shows itis indeed true under surprising broad conditions.
7. Approach RPCA Optimization Algorithm We approach in twodifferent ways. 1st approach, use afirst-order method to solve the primal problem directly. (E.g. ProximalGradient, Accelerated Proximal Gradient (APG)), the computational bottleneck ofeach iteration is a SVD computation. 2ndapproach is to formulate and solve the dual problem, and retrieve the primalsolution from the dual optimal solution. The dual problem too RPCA canbe written as: maxYtrace(DTY) , subject to J(Y) ≤ 1 where J(Y) = max(||Y||2,λ-1||Y||∞). ||A||x means the x norm of A.(无穷范数表示矩阵中绝对值最大的一个)。This dual problem can besolved by constrained steepest ascent. Now let’s talk about Augmented Lagrange Multiplier (ALM) and Alternating Directions Method (ADM) [2,4].
7.1. General method of ALM For the optimizationproblem min f(X), subj. h(X)=0 (3) we can define the Lagrangefunction: L(X,Y,μ) = f(X)+<Y, h(x)>+μ/2||h(X)|| F2 (4) where Y is a Lagrangemultiplier and μ is a positive scalar. General method of ALM is: 
A genetic Lagrangemultiplier algorithm would solve PCP(principle component pursuit) by repeatedlysetting (Xk) = arg min L(Xk,Yk,μ), and then the Lagrangemultiplier matrix via Yk+1=Yk+μ(hk(X)) 7.2 ALM algorithm for RPCA In RPCA, we can define (5) X = (A,E), f(x) = ||A||* + λ||E||1, andh(X) = D-A-E Then the Lagrange functionis (6) L(A,E,Y, μ) = ||A||* + λ||E||1+ <Y, D-A-E>+μ/2·|| D-A-E || F 2 The Optimization Flow is just like the general ALM method. The initialization Y= Y0* isinspired by the dual problem(对偶问题) as it is likely to make the objective function value <D,Y0*> reasonably large. Theorem 1. For algorithm 4, any accumulation point (A*,E*) of (Ak*, Ek*)is an optimal solution to the RPCA problem and the convergence rate is at least O(μk-1).[5] 
Inthis RPCA algorithm, a iteration strategy is adopted. As the optimizationprocess might be confused, we impose 2 well-known facts: (7)(8) Sε[W] = arg minX ε||X||1+ ½||X-W||F2 (7)
U Sε[W] VT = arg minX ε||X||*+½||X-W||F2 (8) which is used in the abovealgorithm for optimization one parameter while fixing another. In thesolutions, Sε[W] is the soft thresholding operator. Here I will impose the problemto speculate this. (To facilitate writing formulation and ploting, I use amanuscript. )

BTW,S_u(x) is easily implemented by 2 lines: S_u(X)= max(x-u , 0); S_u(X)= S_u(X) + min(X+u , 0);


Now we utilize formulation (7,8) into RPCA problem. For the objective function (6) w.r.t get optimal E, wecan rewrite the objective function by deleting unrelated component into: f(E) = λ||E||1+ <Y, D-A-E> +μ/2·|| D-A-E|| F 2 =λ||E||1+ <Y, D-A-E> +μ/2·||D-A-E || F 2+(μ/2)||μ-1Y||2 //add an irrelevant item w.r.t E =λ||E||1+(μ/2)(2(μ-1Y· (D-A-E))+|| D-A-E || F 2+||μ-1Y||2) //try to transform into (7)’s form =(λ/μ)||E||1+½||E-(D-A-μ-1Y)||F2 Finally we get the form of (7) and in the optimizationstep of E, we have E = Sλ/μ[D-A-μ-1Y] ,same as what mentioned in algorithm 4. Similarly, for matrices X, we can prove A=US1/μ(D-E-μ-1Y)V is theoptimization process of A. 8. Experiments Here I've tested on a video data. This data is achieved from a fixed point camera and the scene is same at most time, thus the active variance part can be regarded as error E and the stationary/invariant part serves as low rank matrix A. The following picture shows the result. As the person walks in, error matrix has its value. The 2 subplots below represent low rank matrix and sparse one respectively.

9. Reference: 1) E. J. Candes and B. Recht. Exact Matrix Completion Via ConvexOptimization. Submitted for publication, 2008. 2) E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust PrincipalComponent Analysis Submitted for publication, 2009. 3) Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robustprincipal component analysis: Exact recovery of corrupted low-rank matrices viaconvex optimization. In: NIPS 2009. 4) X. Yuan and J. Yang. Sparse and low-rank matrix decompositionvia alternating direction methods. preprint, 2009. 5) Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrangemultiplier method for exact recovery of a corrupted low-rank matrices.Mathematical Programming, submitted, 2009. 6) Generalized Power method for Sparse Principal Component Analysis
|