Debug kernel module with BUG() or BUG_ON(). In the code, put something
like if (impossible cond) BUG(); . This will
cause kernel print trace and die, thus you have a chance to find early
something bad happened.
After crash, from console, write down EIP's value, function name, and offset. For example:
EIP [e09d76d0] find_min+0x60/0x240. Of course
you also want to write down stack trace.
Now, use objdump -d -S module.ko to disassemble
your kernel module. Find the offending function (e.g., find_min), plus the
offset (e.g., 0x60), then see which C code cause the problem.
Another way: e.g., gdb module.ko, use disassemble find_min
to show assembly, list *find_min+96 to see source line number, and
dir kernel_include_dir to add kernel header path to GDB's search path.
Read more detailed process: How To Locate An Oops
by Denis Vlasenko.
SMP spin lock issues: Pretty much everything you need to know
can be found in Documentation/spinlocks.txt included in the kernel source.
Typical use is spin_lock_irqsave() ...
spin_unlock_irqrestore(). Local interrupt is disabled so that no local
interruptions, spin locks ensure cross processor safety. However, the catch
here is cache coherency and speculative execution of CPUs.
Simply put, spin_lock() is barrier (using lock prefix), but spin_unlock()
is not. On x86, write is ordered, but speculative read is possible and
could cause problem in the case of spin_unlock() followed by spin_lock().
See this email
for some discussion.
If a problem occurs, consider smp_mb(), smp_wmb(), and smp_rmb().