以下是对相关流程和socket错误码正确处理的小结。
一. Socket/Epoll主要遇到的问题:
(1) 非阻塞socket下,接收流程(recv/recvfrom)对错误(EINTR/EAGAIN/EWOULDBLOCK)当成Fatal错误处理,产生频繁断连.
(2)EPOLLERR/EPOLLHUP事件时,直接调用socket异常处理,产生频繁断连.
(3)udp socket接收到size为0数据时采用异常处理,导致socket关闭.
二.Socket/Epoll主要流程对socket错误码正确处理小结:
1. (send/sendto)和(recv/recvfrom)不要把错误(EINTR/EAGAIN/EWOULDBLOCK)当成Fatal.
| EINTR 4 - ("Interrupted system call")"The receive was interrupted by delivery of a signal before any data were available" |
发送/接收处理过程被中断打断.
| EAGAIN11-("Try again") EWOULDBLOCK11-("Resource temporarily unavailable") "The socket is marked nonblocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received. POSIX.1-2001 allows either error to be returned for this case,and does not require these constants to have the same value, so a portable application should check for both possibilities." |
发送:在非阻塞模式下,send/sendto的过程仅仅是将数据拷贝到到协议栈的缓冲区,当发送缓冲区可用空间为0时,返回-1,errno设置为EAGAIN/EWOULDBLOCK.阻塞设置超时,当超时到达时,返回-1,errno设置为EAGAIN/EWOULDBLOCK.
接收:在非阻塞模式下,当recv/recvfrom无数据可读取时,不对阻塞等待数据准备就绪返回,而是返回EAGAIN/EWOULDBLOCK错误,提示进程稍后再试.阻塞设置超时,当超时到达时,返回-1,errno设置为EAGAIN/EWOULDBLOCK.
2. connect不要把错误(EINTR/EINPROGRESS/EAGAIN)当成Fatal.
| EINPROGRESS 115 - ("Operation now in progress") The socket is nonblocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, usegetsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect()completed successfully (SO_ERROR is zero) orunsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure). |
当客户端设置非阻塞模式,调用connect请求连接服务器会立刻返回,此时连接三次握手还在进行中,所以返回-1,errno设置为EINPROGRESS,该情况下需要忽略,后续通过getsockopt SO_ERROR获取成功/失败结果.
| EAGAIN11-("Try again") "No more free local ports or insufficient entries in the routing cache. For AF_INET see the description of /proc/sys/net/ipv4/ip_local_port_range ip(7) for information on how to increase the number of local ports." |
由于资源问题导致返回-1和EAGAIN错误码,建议通过多次尝试后再报Fatal错误.
3. accept不要把错误(EINTR/ECONNABORTED/EPROTO)当成Fatal.
| ECONNABORTED 103-("Software caused connection")Aconnectionhasbeenaborted. EPROTO 71-("Protocol error")Protocolerror |
这两种错误发生在已建立的tcp连接在非被服务器端accept时被客户端夭折的情况下,继承自Berkeley的实现完全由内核来处理已终止连接,服务器进程永远看不到它.然而,大部分的SVR4实现,在accept返回时返回一个错误给进程,而返回的错误又是依赖于实现.SVR4实现返回EPROTO,POSIX返回ECONNABORTED.
4. epoll_wait对EPOLLERR/EPOLLHUP事件和socket error处理.
| EPOLLERR Error condition happened on the associated file descriptor.epoll_wait(2) will always wait for this event; it is not necessary to set it in events. EPOLLHUP Hang up happened on the associated filed escriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events. |
大部分系统对EPOLLERR/EPOLLHUP事件直接调用error异常处理,EPOLLIN调用读取,EPOLLOUT调用发送;但这样潜在一些问题.
当socket的异常是通过epoll_wait发现抛出EPOLLERR/EPOLLHUP事件,而不是在read/write流程中发现,这时同样会误以为异常断开连接. 所以只是在读取/发送流程中忽略相关错误码不够完善;当epoll_wait检查到socket错误(EINTR/EAGAIN/EWOULDBLOCK…)时,仍然会当成fatal error处理.(前端服务器遇到的类似问题)
正确的做法,建议:
(1)读取/发送流程中对非致命错误(EINTR/EAGAIN/EWOULDBLOCK…)合理处理.
(2)遇到EPOLLERR/EPOLLHUP事件时,有两种做法:
(2.1)不调用error异常流程,而是跟EPOLLIN一样调用读取流程,让读取流程去确认/处理实际的错误.
(2.2)通过getsockopt SO_ERROR获取具体的错误码,并过滤掉非Fatal错误.
5. recv/recvfrom接收空数据需要区分.
(1)tcp recv接收返回size 0表示对端连接已经关闭,需要做相关异常处理.
(2)udp recvfrom接收返回size 0报文属于正常行为,不能当成异常处理,因为允许发送size 0的空负载udp报文.
posted on 2019-08-14 19:11
长戟十三千 阅读(1336)
评论(0) 编辑 收藏 引用