老外是这么说的:
http://cbloomrants.blogspot.be/2011/07/07-14-11-compareexchangestrong-vs.html
07-14-11 - compare_exchange_strong vs compare_exchange_weak
The C++0x standard was revised a while ago to split compare_exchange (aka CAS) into two ops. A quick note on the difference :
bool compare_exchange_weak( T * ptr, T * old, T new );
bool compare_exchange_strong( T * ptr, T * old, T new );
(BTW in the standard "old" is actually a reference, which is a damn shitty thing to do because it makes it very non-transparent that "old" gets mutated by these functions, so I am showing it as a pointer).
both try to do :
atomically {
if ( *ptr == *old ) { *ptr = new; return true; }
else { *old = *ptr; return false; }
}
the difference is that compare_exchange_weak can also return false for spurious failure. (the original C++0x definition of CAS always allowed spurious failure; the new thing is the _strong version which doesn't).
If it returns due to spurious failure, then *old might be left untouched (and in fact, *ptr might be equal to *old but we failed anyway).
If spurious failure can only occur due to contention, then you can still gaurantee progress. In fact in the real world, I believe that LL-SC architectures cannot gaurantee progress, because you can get spurious failure if there is contention anywhere on the cache line, and you need that contention to be specifically on your atomic variable to gaurantee progress. (I guess if you are really worried about this, then you should ensure that atomic variables are padded so they get their own cache line, which is generally good practice for performance reasons anyway).
On "cache line lock" type architectures like x86, there is no such thing as spurious failure. compare_exchange just maps to "cmpxchg" instruction and you always get the swap that you want. (it can still fail of course, if the value was not equal to the old value, but it will reload old). (BTW it's likely that x86 will move away from this in the future, because it's very expensive for very high core counts)
compare_exchange_weak exists for LL-SC (load linked/store conditional) type architectures (Power, ARM, basically everything except x86), because on them compare_exchange_strong must be implemented as a loop, while compare_exchange_weak can be non-looping. For example :
On ARM, compare_exchange_weak is something like :
compare_exchange_weak:
ldrex // load with reservation
teq // test equality
strexeq // store if equal
and strexeq can fail for two reasons - either because they weren't equal, or because the reservation was lost (because someone else touched our cache line).
To implement compare_exchange_strong you need a loop :
compare_exchange_strong:
while ( ! compare_exchange_weak(ptr,old,new) ) { }
(note that you might be tempted to put a (*old = *ptr) inside the loop, but that's probably not a good idea, and not necessary, because compare_exchange_weak will eventually load *ptr into *old itself when it doesn't fail spuriously).
The funny bit is that when you use compare_exchange you often loop anyway. For example say I want to use compare_exchange_strong to increment a value, I have to do :
cur = *ptr;
while( ! compare_exchange_strong(ptr,&cur,cur+1) ) { }
(note it's a little subtle that this works - when compare_exchange_strong fails, it's because somebody else touched *ptr, so we then reload cur (this is why "old" is passed by address), so you then recompute cur+1 from the new value; so with the compare_exchange_strong, cur has a different value each time around this loop.)
But on an LL-SC architecture like ARM this becomes a loop on a loop, which is dumb when you could get the same result with a single loop :
cur = *ptr;
while( ! compare_exchange_weak(ptr,&cur,cur+1) ) { }
Note that with this loop now cur does *not* always take a new value each time around the loop (it does when it fails due to contention, but not when it fails just due to reservation-lost), but the end result is the same.
So that's why compare_exchange_weak exists, but you might ask why compare_exchange_strong exists. If we always use loops like this, then there's no need for it. But we don't always use loops like this, or we might want to loop at the much higher level. For example you might have something like :
bool SpinLock_TryLock(int * lock)
{
int zero = 0;
return compare_exchange_strong(lock,&zero,1);
}
which returns false if it couldn't get the lock (and then might do an OS wait) - you don't want to return false just because of a spurious failure. (that's not a great example, maybe I'll think of a better one later).
(BTW I think the C++0x stuff is a little bit broken, like most of C standardization, because they are trying to straddle this middle ground of exposing the efficient hardware-specific ways of doing things, but they don't actually expose enough to map directly to the hardware, and they also aren't high level enough to separate you from knowing about the hardware. For example none of their memory model actually maps directly to what x86 provides, therefore there are some very efficient x86-specific ways to do threading ops that cannot be expressed portable in C++0x. Similarly on LL-SC architectures, it would be preferrable to just have access to LL-SC directly.
I'd rather see things in the standard like "if LL-SC exist on this architecture, then they can be invoked via __ll() and __sc()" ; more generally I wish C had more conditionals built into the language, that would be so much better for real portability, as opposed to the current mess where they pretend that the language is portable but it actually isn't so you have to create your own mess of meta-language through #defines).