S.l.e!ep.¢%

像打了激速一样,以四倍的速度运转,开心的工作
简单、开放、平等的公司文化;尊重个性、自由与个人价值;
posts - 1098, comments - 335, trackbacks - 0, articles - 1
  C++博客 :: 首页 :: 新随笔 :: 联系 :: 聚合  :: 管理

IOCP使用时常见的几个错误

Posted on 2009-09-12 13:17 S.l.e!ep.¢% 阅读(1943) 评论(0)  编辑 收藏 引用 所属分类: IOCP

IOCP使用时常见的几个错误

Posted on 2009-09-12 00:20 Fox 阅读(25) 评论(0)  编辑 收藏引用

本文同步自游戏人生

在使用IOCP时,最重要的几个API就是GetQueueCompeltionStatus、WSARecv、WSASend,数据的I/O及其完成状态通过这几个接口获取并进行后续处理。

GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.

						BOOL WINAPI GetQueuedCompletionStatus(
  __in   HANDLE CompletionPort,
  __out  LPDWORD lpNumberOfBytes,
  __out  PULONG_PTR lpCompletionKey,
  __out  LPOVERLAPPED *lpOverlapped,
  __in   DWORD dwMilliseconds
);
				

If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfBytes, lpCompletionKey, and lpOverlapped parameters.

除了关心这个API的in & out(这是MSDN开头的几行就可以告诉我们的)之外,我们更加关心不同的return & out意味着什么,因为由于各种已知或未知的原因,我们的程序并不总是有正确的return & out。

If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.

假设我们指定dwMilliseconds为INFINITE。

这里常见的几个错误有:

WSA_OPERATION_ABORTED (995): Overlapped operation aborted.

由于线程退出或应用程序请求,已放弃I/O 操作。

MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.

成因分析:这个错误一般是由于peer socket被closesocket或者WSACleanup关闭后,针对这些socket的pending overlapped I/O operation被中止。

解决方案:针对socket,一般应该先调用shutdown禁止I/O操作后再调用closesocket关闭。

严重程度轻微易处理

WSAENOTSOCK (10038): Socket operation on nonsocket.

MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.

成因分析:在一个非套接字上尝试了一个操作。

使用closesocket关闭socket之后,针对该invalid socket的任何操作都会获得该错误。

解决方案:如果是多线程存在对同一socket的操作,要保证对socket的I/O操作逻辑上的顺序,做好socket的graceful disconnect。

严重程度轻微易处理

WSAECONNRESET (10054): Connection reset by peer.

远程主机强迫关闭了一个现有的连接。

MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

成因分析:在使用WSAAccpet、WSARecv、WSASend等接口时,如果peer application突然中止(原因如上所述),往其对应的socket上投递的operations将会失败。

解决方案:如果是对方主机或程序意外中止,那就只有各安天命了。但如果这程序是你写的,而你只是hard close,那就由不得别人了。至少,你要知道这样的错误已经出现了,就不要再费劲的继续投递或等待了。

严重程度轻微易处理

WSAECONNREFUSED (10061): Connection refused.

由于目标机器积极拒绝,无法连接。

MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.

成因分析:在使用connect或WSAConnect时,服务器没有运行或者服务器的监听队列已满;在使用WSAAccept时,客户端的连接请求被condition function拒绝。

解决方案:Call connect or WSAConnect again for the same socket. 等待服务器开启、监听空闲或查看被拒绝的原因。是不是长的丑或者钱没给够,要不就是服务器拒绝接受天价薪酬自主创业去了?

严重程度轻微易处理

WSAENOBUFS (10055): No buffer space available.

由于系统缓冲区空间不足或列队已满,不能执行套接字上的操作。

MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

成因分析:这个错误是我查看错误日志后,最在意的一个错误。因为服务器对于消息收发有明确限制,如果缓冲区不足应该早就处理了,不可能待到send/recv失败啊。而且这个错误在之前的版本中几乎没有出现过。这也是这篇文章的主要内容。像connect和accept因为缓冲区空间不足都可以理解,而且危险不高,但如果send/recv造成拥堵并恶性循环下去,麻烦就大了,至少说明之前的验证逻辑有疏漏。

WSASend失败的原因是:The Windows Sockets provider reports a buffer deadlock. 这里提到的是buffer deadlock,显然是由于多线程I/O投递不当引起的。

解决方案:在消息收发前,对最大挂起的消息总的数量和容量进行检验和控制。

严重程度严重

本文主要参考MSDN

************* 说明 *************

Fox只是对自己关心的几个错误和API参照MSDN进行分析,不提供额外帮助。


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理