[转] linux 常用定位问题命令总结

1：查看CPU负载--mpstat

mpstat -P ALL [internal [count]]

参数的含义如下：

-P ALL 表示监控所有CPU

internal 相邻的两次采样的间隔时间

count 采样的次数

mpstat命令从/proc/stat获得数据输出

输出的含义如下：

CPU 处理器ID

user 在internal时间段里，用户态的CPU时间（%），不包含 nice值为负进程 ?usr/?total*100

nice 在internal时间段里，nice值为负进程的CPU时间（%） ?nice/?total*100

system 在internal时间段里，核心时间（%） ?system/?total*100

iowait 在internal时间段里，硬盘IO等待时间（%） ?iowait/?total*100

irq 在internal时间段里，软中断时间（%） ?irq/?total*100

soft 在internal时间段里，软中断时间（%） ?softirq/?total*100

idle 在internal时间段里，CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间（%） ?idle/?total*100

intr/s 在internal时间段里，每秒CPU接收的中断的次数 ?intr/?total*100

CPU总的工作时间total_cur=user+system+nice+idle+iowait+irq+softirq

total_pre=pre_user+ pre_system+ pre_nice+ pre_idle+ pre_iowait+ pre_irq+ pre_softirq

user=user_cur – user_pre

total=total_cur-total_pre

其中_cur 表示当前值，_pre表示interval时间前的值。上表中的所有值可取到两位小数点。

2：查看磁盘io情况及CPU负载--vmstat

usage: vmstat [-V] [-n] [delay [count]]

-V prints version.

-n causes the headers not to be reprinted regularly.

-a print inactive/active page stats.

-d prints disk statistics

-D prints disk table

-p prints disk partition statistics

-s prints vm table

-m prints slabinfo

-S unit size

delay is the delay between updates in seconds.

unit size k:1000 K:1024 m:1000000 M:1048576 (default is K)

count is the number of updates.

vmstat从/proc/stat获得数据

输出的含义如下:

FIELD DESCRIPTION FOR VM MODE

Procs

r: The number of processes waiting for run time.

b: The number of processes in uninterruptible sleep.

Memory

swpd: the amount of virtual memory used.

free: the amount of idle memory.

buff: the amount of memory used as buffers.

cache: the amount of memory used as cache.

inact: the amount of inactive memory. (-a option)

active: the amount of active memory. (-a option)

Swap

si: Amount of memory swapped in from disk (/s).

so: Amount of memory swapped to disk (/s).

bi: Blocks received from a block device (blocks/s).

bo: Blocks sent to a block device (blocks/s).

System

in: The number of interrupts per second, including the clock.

cs: The number of context switches per second.

CPU

These are percentages of total CPU time.

us: Time spent running non-kernel code. (user time, including nice time)

sy: Time spent running kernel code. (system time)

id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.

wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

st: Time spent in involuntary wait. Prior to Linux 2.6.11, shown as zero.

3：查看内存使用情况--free

usage: free [-b|-k|-m|-g] [-l] [-o] [-t] [-s delay] [-c count] [-V]

-b,-k,-m,-g show output in bytes, KB, MB, or GB

-l show detailed low and high memory statistics

-o use old format (no -/+buffers/cache line)

-t display total for RAM + swap

-s update every [delay] seconds

-c update [count] times

-V display version information and exit

[root@Linux /tmp]# free

total used free shared buffers cached

Mem: 255268 238332 16936 0 85540 126384

-/+ buffers/cache: 26408 228860

Swap: 265000 0 265000

Mem：表示物理内存统计

-/+ buffers/cached：表示物理内存的缓存统计

Swap：表示硬盘上交换分区的使用情况，这里我们不去关心。

系统的总物理内存：255268Kb（256M），但系统当前真正可用的内存b并不是第一行free 标记的 16936Kb，它仅代表未被分配的内存。

第1行 Mem：

total：表示物理内存总量。

used：表示总计分配给缓存（包含buffers 与cache ）使用的数量，但其中可能部分缓存并未实际使用。

free：未被分配的内存。

shared：共享内存，一般系统不会用到，这里也不讨论。

buffers：系统分配但未被使用的buffers 数量。

cached：系统分配但未被使用的cache 数量。buffer 与cache 的区别见后面。

total = used + free

第2行 -/+ buffers/cached：

used：也就是第一行中的used - buffers-cached 也是实际使用的内存总量。

free：未被使用的buffers 与cache 和未被分配的内存之和，这就是系统当前实际可用内存。

free 2= buffers1 + cached1 + free1 //free2为第二行、buffers1等为第一行

buffer 与cache 的区别

A buffer is something that has yet to be "written" to disk.

A cache is something that has been "read" from the disk and stored for later use

第3行：

对操作系统来讲是Mem的参数.buffers/cached 都是属于被使用,所以它认为free只有16936.

对应用程序来讲是(-/+ buffers/cach).buffers/cached 是等同可用的，因为buffer/cached是为了提高文件读取的性能，当应用程序需在用到内存的时候，buffer/cached会很快地被回收。

所以从应用程序的角度来说，可用内存=系统free memory+buffers+cached.

swap

swap就是LINUX下的虚拟内存分区,它的作用是在物理内存使用完之后,将磁盘空间(也就是SWAP分区)虚拟成内存来使用.

4：查看网卡情况--sar

详细见man

4.1：查看网卡流量：sar -n DEV delay count

服务器网卡最大能承受流量由网卡本身决定，分为10M、10/100自适应、100+以及1G网卡，一般普通服务器用的是百兆，也有用千兆的。

输出解释：

IFACE

Name of the network interface for which statistics are reported.

rxpck/s

Total number of packets received per second.

txpck/s

Total number of packets transmitted per second.

rxbyt/s

Total number of bytes received per second.

txbyt/s

Total number of bytes transmitted per second.

rxcmp/s

Number of compressed packets received per second (for cslip etc.).

txcmp/s

Number of compressed packets transmitted per second.

rxmcst/s

Number of multicast packets received per second.

4.2：查看网卡失败情况：sar -n EDEV delay count

输出解释：

IFACE

Name of the network interface for which statistics are reported.

rxerr/s

Total number of bad packets received per second.

txerr/s

Total number of errors that happened per second while transmitting packets.

coll/s

Number of collisions that happened per second while transmitting packets.

rxdrop/s

Number of received packets dropped per second because of a lack of space in linux buffers.

txdrop/s

Number of transmitted packets dropped per second because of a lack of space in linux buffers.

txcarr/s

Number of carrier-errors that happened per second while transmitting packets.

rxfram/s

Number of frame alignment errors that happened per second on received packets.

rxfifo/s

Number of FIFO overrun errors that happened per second on received packets.

txfifo/s

Number of FIFO overrun errors that happened per second on transmitted packets.

5：定位问题进程--top, ps

top -d delay，详细见man

ps aux 查看进程详细信息

ps axf 查看进程树

6：查看某个进程与文件关系--losf

需要root权限才能看到全部，否则只能看到登录用户权限范围内的内容

lsof -p 77//查看进程号为77的进程打开了哪些文件

lsof -d 4//显示使用fd为4的进程

lsof abc.txt//显示开启文件abc.txt的进程

lsof -i :22//显示使用22端口的进程

lsof -i tcp//显示使用tcp协议的进程

lsof -i tcp:22//显示使用tcp协议的22端口的进程

lsof +d /tmp//显示目录/tmp下被进程打开的文件

lsof +D /tmp//同上，但是会搜索目录下的目录，时间较长

lsof -u username//显示所属user进程打开的文件

7：查看程序运行情况--strace

usage: strace [-dffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]

[-p pid] ... [-s strsize] [-u username] [-E var=val] ...

[command [arg ...]]

or: strace -c [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...

[command [arg ...]]

常用选项：

-f：除了跟踪当前进程外，还跟踪其子进程。

-c：统计每一系统调用的所执行的时间,次数和出错的次数等.

-o file：将输出信息写到文件file中，而不是显示到标准错误输出（stderr）。

-p pid：绑定到一个由pid对应的正在运行的进程。此参数常用来调试后台进程。

8：查看磁盘使用情况--df

test@wolf:~$ df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda1 3945128 1810428 1934292 49% /

udev 745568 80 745488 1% /dev

/dev/sda3 12649960 1169412 10837948 10% /usr/local

/dev/sda4 63991676 23179912 37561180 39% /data

9：查看网络连接情况--netstat

常用：netstat -lpn

选项说明：

-p, --programs display PID/Program name for sockets

-l, --listening display listening server sockets

-n, --numeric don't resolve names

-a, --all, --listening display all sockets (default: connected)

posted on 2010-11-21 12:25 豪阅读(1421) 评论(1) 编辑收藏引用

常用链接

留言簿(19)

随笔分类(81)

文章分类(89)

相册

ACM OJ

My friends

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜

只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理