昨 天接到客户报告,说一个RAC节点的归档存储目录变成只读的了,导致无法创建归档日志,因此重做日志也无法切换,幸好是RAC,客户说系统重启动后,就可 以了,但是一会又变成只读的了,一开始判断可能挂载的有问题,于是就去查看了ROOT用户的操作历史,到是有加载混乱的问题,但是把怀疑的地方排除后,还 是只读的。于是开始查看系统日志,因为ORACLE BUG 5722352,系统日志里全是
Feb 12 10:16:57 su(pam_unix)[28104]: session opened for user oracle by (uid=0)
Feb 12 10:16:57 su(pam_unix)[28104]: session closed for user oracle
这种信息,没办法,让客户截取了30W行,我才好容易找到启动日志,从而找到了一些有价值的信息
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110988
Jul 9 16:15:38 dbrac2 kernel: Aborting journal on device sdh1.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110989
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110990
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110991
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110992
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110993
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110994
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110995
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110996
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110997
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110998
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110999
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_truncate: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_orphan_del: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_delete_inode: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: ext3_abort called.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_journal_start_sb: Detected aborted journal
Jul 9 16:15:38 dbrac2 kernel: Remounting filesystem read-only
可以看到是系统内核把sdh1(/arch02)REMOUNT成只读的了,在看上边是磁盘系统出现问题了。这个是LINUX系统内核管理的机制,为什么系统重启会好呢?
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Marking fs in need of filesystem check.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Jul 8 00:20:37 dbrac2 kernel: EXT3 FS on sdh1, internal journal
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: recovery complete.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
也只能从这里找出原因了。
我没有FSCK修复磁盘系统,因为错误比较严重,上边的归档日志也是7号之前的了,里边的日志也无法拷贝出来,最后决定为了以后的运行文档,把SDH1重新格式化了,然后重新挂载就OK了。
一般遇到次问题后需要检查几个方面
一、空间是否足够
二、inode是否足够
三、目录权限属主是否改过
四、挂载是否有问题,默认是挂载是读写状态的(mount -o rw / /)
五、检查系统日志是否有磁盘错误
六、出现次错误,硬件出问题的可能性比较大