Linux Socket:如何在客户端程序中检测断开的网络?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14782143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:58:43  来源:igfitidea点击:

Linux Socket: How to detect disconnected network in a client program?

clinuxsocketssenddisconnection

提问by user2052197

I am debugging a c based linux socket program. As all the examples available in websites, I applied the following structure:

我正在调试基于 ac 的 linux 套接字程序。作为网站上所有可用的示例,我应用了以下结构:

sockfd= socket(AF_INET, SOCK_STREAM, 0);

connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));

send_bytes = send(sockfd, sock_buff, (size_t)buff_bytes, MSG_DONTWAIT);

I can detect the disconnection when the remove server closes its server program. But if I unplug the ethernet cable, the send function still return positive values rather than -1.

当删除服务器关闭其服务器程序时,我可以检测到断开连接。但是如果我拔掉以太网电缆,发送函数仍然返回正值而不是 -1。

How can I check the network connection in a client program assuming that I can not change server side?

假设我无法更改服务器端,如何检查客户端程序中的网络连接?

采纳答案by cnicutar

But if I unplug the ethernet cable, the send function still return positive values rather than -1.

但是如果我拔掉以太网电缆,发送函数仍然返回正值而不是 -1。

First of all you should know senddoesn't actually send anything, it's just a memory-copying function/system call. It copies data from your process to the kernel - sometime later the kernel will fetch that data and send it to the other side after packaging it in segments and packets. Therefore sendcan only return an error if:

首先,您应该知道send实际上并没有发送任何内容,它只是一个内存复制函数/系统调用。它将数据从您的进程复制到内核 - 稍后内核将获取该数据并将其打包成段和数据包后发送到另一端。因此send只能在以下情况下返回错误:

  • The socket is invalid (for example bogus file descriptor)
  • The connection is clearly invalid, for example it hasn't been established or has already been terminated in some way (FIN, RST, timeout - see below)
  • There's no more room to copy the data
  • 套接字无效(例如伪造的文件描述符)
  • 连接显然无效,例如它尚未建立或已以某种方式终止(FIN、RST、超时 - 见下文)
  • 没有更多的空间来复制数据

The main point is that senddoesn't send anything and therefore its return code doesn't tell you anything about data actually reaching the other side.

主要的一点是它send不发送任何内容,因此它的返回代码不会告诉您有关实际到达另一端的数据的任何信息

Back to your question, when TCP sends data it expects a valid acknowledgement in a reasonable amount of time. If it doesn't get one, it resends. How often does it resend ? Each TCP stack does things differently, but the norm is to use exponential backoffs. That is, first wait 1 second, then 2, then 4 and so on. On some stacks this process can take minutes.

回到你的问题,当 TCP 发送数据时,它期望在合理的时间内得到有效的确认。如果没有收到,它会重新发送。多久重发一次?每个 TCP 堆栈的处理方式各不相同,但标准是使用指数退避。即先等待 1 秒,然后是 2 秒,然后是 4 秒,依此类推。在某些堆栈上,此过程可能需要几分钟。

The main point is that in the case of an interruption TCP will declare a connection dead only after a seriously large period of silence(on Linux it does something like 15 retries - more than 5 minutes).

主要的一点是,在中断的情况下,TCP只会在一段非常长的静默期后才宣布连接死亡(在 Linux 上它会重试 15 次 - 超过 5 分钟)。

One way to solve this is to implement some acknowledgement mechanism in your application. You could for example send a request to the server "reply within 5 seconds or I'll declare this connection dead" and then recvwith a timeout.

解决此问题的一种方法是在您的应用程序中实现一些确认机制。例如,您可以向服务器发送请求“在 5 秒内回复,否则我将宣布此连接已死”,然后recv超时。

回答by Forhad Ahmed

To detect a remote-disconnect, do a read()

要检测远程断开连接,请执行 read()

Check this thread for more info:

检查此线程以获取更多信息:

Can read() function on a connected socket return zero bytes?

连接的套接字上的 read() 函数可以返回零字节吗?

回答by Ramy Al Zuhouri

Check the return value, and see if it's equal to this value:

检查返回值,看看它是否等于这个值:

EPIPE
This socket was connected but the connection is now broken. In this case, send generates a SIGPIPE signal first; if that signal is ignored or blocked, or if its handler returns, then send fails with EPIPE.

EPIPE
此套接字已连接,但现在已断开连接。在这种情况下,send首先产生一个SIGPIPE信号;如果该信号被忽略或阻塞,或者它的处理程序返回,则发送失败并显示 EPIPE。

Also add a check for the SIGPIPE signal in your handler, to make it be more controllable.

还要在处理程序中添加对 SIGPIPE 信号的检查,以使其更易于控制。

回答by cloudrain21

You can't detect the unplugged ethernet cable only with calling write() funcation. That's because of tcp retransmission acted by tcp stack without your consciousness. Here are solutions.

仅通过调用 write() 函数无法检测到未插入的以太网电缆。那是因为 tcp 重传是由 tcp 堆栈在您不知情的情况下进行的。以下是解决方案。

Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket. That's because of tcp retransmission by the kernel tcp stack. tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout. It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism. You can see this computation in rfc793. (3.7. Data Communication)

即使您已经为应用程序套接字设置了 keepalive 选项,您也无法及时检测套接字的死连接状态,以防您的应用程序继续在套接字上写入。那是因为内核 tcp 堆栈进行了 tcp 重传。tcp_retries1 和 tcp_retries2 是配置 tcp 重传超时的内核参数。重传超时的精确时间很难预测,因为它是通过 RTT 机制计算的。你可以在 rfc793 中看到这个计算。(3.7. 数据通讯)

https://www.rfc-editor.org/rfc/rfc793.txt

https://www.rfc-editor.org/rfc/rfc793.txt

Each platforms have kernel configurations for tcp retransmission.

每个平台都有用于 tcp 重传的内核配置。

Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)

http://linux.die.net/man/7/tcp

http://linux.die.net/man/7/tcp

HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval

http://www.hpuxtips.es/?q=node/53

http://www.hpuxtips.es/?q=node/53

AIX : rto_low, rto_high, rto_length, rto_limit

http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf

http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf

You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said. In addition, currently you can't set those values only for single socket. Those are global kernel parameters. There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.

如果您想及早检测到死连接,您应该为 tcp_retries2 设置较低的值(默认为 15),但这不是我已经说过的精确时间。此外,目前您不能仅为单个套接字设置这些值。这些是全局内核参数。有一些尝试将 tcp 重传套接字选项应用于单个套接字(http://patchwork.ozlabs.org/patch/55236/),但我认为它没有应用于内核主线。我在系统头文件中找不到这些选项定义。

For reference, you can monitor your keepalive socket option through 'netstat --timers' like below. https://stackoverflow.com/questions/34914278

作为参考,您可以通过如下所示的“netstat --timers”监控您的 keepalive 套接字选项。 https://stackoverflow.com/questions/34914278

netstat -c --timer | grep "192.0.0.1:43245             192.0.68.1:49742"

tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (1.92/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (0.71/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (9.46/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (8.30/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (7.14/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (5.98/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (4.82/0/1)

In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events. For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs. You will meet ETIMEDOUT error in recv() call at that time.

另外,当keepalive超时发生时,根据使用的平台不同,可能会遇到不同的返回事件,所以一定不能只通过返回事件来决定死连接状态。例如,HP 返回 POLLERR 事件,而 AIX 在 keepalive 超时发生时仅返回 POLLIN 事件。届时您将在 recv() 调用中遇到 ETIMEDOUT 错误。

In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.

在最近的内核版本(自 2.6.37 起)中,您可以使用 TCP_USER_TIMEOUT 选项会很好地工作。此选项可用于单个套接字。

Finally, you can use read function with MSG_PEEK flag, which can let you check that the socket is okay. (MSG_PEEK just peeks if data arrived at kernel stack buffer and never copies the data into user buffer.) So you can use this flag just for checking socket is okay without any side effect.

最后,您可以使用带有 MSG_PEEK 标志的 read 函数,它可以让您检查套接字是否正常。(如果数据到达内核堆栈缓冲区,MSG_PEEK 只是偷看,并且永远不会将数据复制到用户缓冲区。)因此,您可以使用此标志来检查套接字是否正常,没有任何副作用。